Scaling & Optimization Tips

When operating at scale, cost and latency become primary concerns. The recommendations below help optimize resource use and maintain throughput.

Use visual compression first: Compress and tokenize images to reduce downstream processing and model token usage.
Prefer small models for the common cases: Let small, efficient LLMs handle ~95% of routine parsing and mapping tasks.
Batch non-urgent jobs: Schedule low-priority or large-batch conversions overnight to benefit from lower compute demand.
Cache FHIR templates: Precompile and cache common FHIR templates per document type to avoid regenerating templates repeatedly.
Escalate only on low confidence: Use model-confidence thresholds to route ambiguous cases for manual review or larger models.

Operational Metrics to Track