How long from kickoff to a legal AI feature live in production?
Median is 90 days for a single well-scoped workflow with clear deployment constraints (p95 latency, cost-per-call, accuracy floor); pilots can prove a workflow in 6 to 8 weeks on anonymized matter data. The longest pole is rarely the model, it is per-matter data plumbing, the citation-verification harness, and integration with your CLM, DMS, and e-discovery platforms. We do not ship a legal AI feature without evals running in CI.
How do you stop the Mata v. Avianca hallucinated-citation failure mode?
Every research, drafting, and memo workflow runs cited cases, statutes, and regulations through a citation-verification harness against the firm's licensed databases (Westlaw, Lexis) and open sources (CourtListener), via the firm's authorized access, before output reaches a lawyer. Unverified citations are stripped or flagged. The harness sits in the eval suite and runs in CI.
How do you preserve attorney-client privilege when AI is in the loop?
Per-matter data isolation, zero-retention DPAs, dedicated or VPC-isolated inference endpoints where matter sensitivity requires it, prompt and retrieval logs scoped per matter, and lawyer-in-the-loop checkpoints. Built to fit ABA Model Rule 1.6 and state-bar opinions on generative AI (Florida 24-1, California, New York, DC).
Are AI prompts and outputs discoverable?
Potentially yes. Prompt registries, retrieval logs, and eval datasets are treated as work product where applicable, with per-matter retention and destruction holds that propagate into vector stores, fine-tune datasets, and eval corpora. Auto-retraining respects hold flags or it does not run.
Will our matter data train a shared model?
No, by default and by contract. Opt-out endpoints, zero-retention DPAs, and VPC-isolated or on-prem inference where required. Audit logs prove the data path, and privileged content never reaches a shared index or a shared fine-tune.
What about prompt injection from ingested case files or client uploads?
Filings, emails, and exhibits can contain adversarial instructions targeting the model, so we treat ingested content as untrusted. A four-layer governance stack handles it: model guardrails (Guardrails.ai), validation pipelines, auto-retraining where incidents become regression evals, and real-time observability (LangSmith, Weights and Biases, Evidently AI, Prometheus, Grafana). Content passes validation before it can influence an action.
What happens to ownership of the legal AI system after delivery?
Hand-off is designed from week one. Your in-house team owns model selection, the eval suite, the citation-verification harness, the observability dashboards, and the run-book, and we document the constraint set, the eval methodology, the fallback strategy, and the cost model. A meaningful share of our legal AI work is recovery on systems where this hand-off was never engineered.
Is legal AI accurate and safe enough for client work?
It can be, but only when accuracy is engineered, not assumed. A raw model will invent citations, which is exactly how the Mata v. Avianca sanctions happened. We make legal AI software defensible by grounding every answer in retrieval over the matter file, re-checking each cited case and statute against licensed databases before it reaches a lawyer, and gating releases on a golden-set and LLM-judge eval suite. A lawyer still reviews the output. The accuracy floor is a number in the contract, measured on a reference set, not a marketing claim.
How much does legal AI development cost?
It depends on the workflow and the isolation it requires, so we scope it before quoting. A pilot proving one workflow runs 6 to 8 weeks; a production build of a full workflow with evals, the citation harness, and observability runs 12 to 16 weeks. We model gross margin per feature first, so a feature that prices into negative contribution at expected volume gets re-scoped rather than built. The largest cost driver is rarely the model, it is per-matter data plumbing and integration with your CLM, DMS, and e-discovery platforms.