Healthcare AI questions
Healthcare AI development, answered.
The questions provider, payer, and life-sciences leaders ask us on the first scoping call, answered straight.
How long from kickoff to a healthcare AI feature live in production?
Median is 90 days for a single well-scoped feature with clear deployment constraints (p95 latency, cost-per-call, accuracy floor); pilots can prove a feature in 6 to 8 weeks. The longest pole is almost never the model, it is EHR integration, clinician-in-the-loop review, and BAA scope with your security team. We do not ship a healthcare AI feature without evals running in CI.
Do you sign a Business Associate Agreement, and how is PHI handled?
Yes. We sign BAAs with covered entities and execute them with every model provider in the inference path. PHI either stays inside the BAA boundary end to end, or it is de-identified at the edge via Safe Harbor or Expert Determination before it reaches a model. PHI flow is documented per use case in the AI Assessment, with audit logs on every retrieval.
How do you handle EHR integration?
Through FHIR R4, HL7 v2, and SMART-on-FHIR, with Epic and Oracle Health (Cerner) API clients and Redox where it fits. A feature only matters if it reaches a clinician inside the system they already use, so we scope the integration footprint in the AI Assessment and treat it as the longest pole, not an afterthought.
What about 21 CFR Part 11 and Software as a Medical Device?
When the use case touches diagnosis, treatment decisions, or trial endpoints, SaMD classification changes the eval bar and the audit trail. We flag this in week one, design the eval suite and electronic records for FDA-grade review, and keep signatures and records aligned with 21 CFR Part 11 where it applies. We do not claim a classification on your behalf.
How do you evaluate clinical AI quality?
A three-layer eval suite. A clinician-reviewed reference dataset of 100 to 500 representative cases inside the BAA boundary, scored on the metric that matters for the feature. An adversarial set covering prompt injection, malformed FHIR, and edge-case ICD-10. And a regression set where every production incident becomes a permanent eval entry. The suite runs on every deploy and on a schedule behind feature flags.
What about prompt injection from patient or document content?
We treat ingested notes, documents, and messages as untrusted and run them through a four-layer governance stack: model guardrails (Guardrails.ai and Presidio), validation pipelines (schema validation on structured output), auto-retraining (incidents become regression evals), and real-time observability (LangSmith, Evidently AI, Weights and Biases, Prometheus and Grafana). Ingested content passes validation before it can influence an action.
What happens to ownership of the AI feature after delivery?
We design for hand-off from week one. Your in-house team owns the model selection, the eval suite, the observability dashboards, and the run-book at the end of the engagement, and we document the deployment constraint set, the eval methodology, the fallback strategy, the security checklist built to your HIPAA boundary, and the cost model, with paired on-call before close. A meaningful share of our AI work is recovery on systems where this hand-off was never engineered.