Scaling & Optimization Tips

When operating at scale, cost and latency become primary concerns. The recommendations below help optimize resource use and maintain throughput.

  1. Use visual compression first: Compress and tokenize images to reduce downstream processing and model token usage.
  2. Prefer small models for the common cases: Let small, efficient LLMs handle ~95% of routine parsing and mapping tasks.
  3. Batch non-urgent jobs: Schedule low-priority or large-batch conversions overnight to benefit from lower compute demand.
  4. Cache FHIR templates: Precompile and cache common FHIR templates per document type to avoid regenerating templates repeatedly.
  5. Escalate only on low confidence: Use model-confidence thresholds to route ambiguous cases for manual review or larger models.

Operational Metrics to Track