Architecture Summary
The Condensa pipeline transforms raw document images into standardized healthcare formats (like FHIR) via a staged processing flow.
High-level Pipeline
Document → Vision Encoder → Parser → Mapper → Validator → Valid FHIR Output
Layer Breakdown
| Layer | What it does | Technology / Notes |
|---|---|---|
| Vision Encoder | Converts raw documents (PDFs, scans, images) to compact visual tokens to reduce file size and complexity. | Condensa Vision Core — efficient visual compression & encoding. |
| Parser | Reads visual tokens and extracts structured information: entities, values, units, sections. | Python scripts + domain-specific ontology rules for entity recognition. |
| Mapper | Transforms parsed output into FHIR or ERP JSON using mapping templates and rules. | LLM-assisted rule templates ensure fields conform to FHIR structures. |
| Validator | Validates mapped data against expected schema and FHIR compliance checks. | Pydantic for schema validation; FHIR validator for official compliance tests. |
Example Flow
- User uploads a lab report PDF
- Vision Encoder compresses and tokenizes images
- Parser extracts "Blood Glucose", "Patient Name", "Date"
- Mapper converts extracted fields to a FHIR Observation bundle
- Validator checks FHIR schema — output returned to user
Example FHIR Output (snippet)
{
"resourceType": "Observation",
"id": "obs-12345",
"status": "final",
"code": { "text": "Blood Glucose" },
"subject": { "reference": "Patient/pat-6789" },
"effectiveDateTime": "2025-08-01T09:00:00Z",
"valueQuantity": { "value": 110, "unit": "mg/dL" }
}