Architecture Summary

The Condensa pipeline transforms raw document images into standardized healthcare formats (like FHIR) via a staged processing flow.

High-level Pipeline

Document → Vision Encoder → Parser → Mapper → Validator → Valid FHIR Output

Layer Breakdown

Layer	What it does	Technology / Notes
Vision Encoder	Converts raw documents (PDFs, scans, images) to compact visual tokens to reduce file size and complexity.	Condensa Vision Core — efficient visual compression & encoding.
Parser	Reads visual tokens and extracts structured information: entities, values, units, sections.	Python scripts + domain-specific ontology rules for entity recognition.
Mapper	Transforms parsed output into FHIR or ERP JSON using mapping templates and rules.	LLM-assisted rule templates ensure fields conform to FHIR structures.
Validator	Validates mapped data against expected schema and FHIR compliance checks.	Pydantic for schema validation; FHIR validator for official compliance tests.

Example Flow

User uploads a lab report PDF
Vision Encoder compresses and tokenizes images
Parser extracts "Blood Glucose", "Patient Name", "Date"
Mapper converts extracted fields to a FHIR Observation bundle
Validator checks FHIR schema — output returned to user

Example FHIR Output (snippet)

{
  "resourceType": "Observation",
  "id": "obs-12345",
  "status": "final",
  "code": { "text": "Blood Glucose" },
  "subject": { "reference": "Patient/pat-6789" },
  "effectiveDateTime": "2025-08-01T09:00:00Z",
  "valueQuantity": { "value": 110, "unit": "mg/dL" }
}