Build vs buy AI: when to build the custom layer and when to buy the model
The build vs buy AI question is usually posed as a binary, and that framing is the first mistake. Almost nobody trains a frontier model anymore, and almost nobody gets a durable edge from a thin wrapper over someone else’s product. The real decision is which layer you buy and which layer you build. This guide draws that line with the evidence behind each axis.

The short version
- Build vs buy AI is a layering decision, not a binary. Buy the commodity layer (the model, the generic capability), build the differentiated layer (your product, your data loop, your orchestration).
- The model layer is commoditizing fast, so buy the model. Inference cost for equivalent quality falls roughly 10x a year, which makes renting per token the near-universal default over owning weights.
- Bought solutions reach successful deployment far more often than internal builds: vendor and partnership efforts succeed about 67% of the time, internal builds about one-third as often (MIT NANDA 2025).
- But off-the-shelf agents rarely unlock strategic advantage (McKinsey 2025). The value ceiling lives in what you build, so build only where you hold a real edge at the volume to justify it.
- Buying defers the build risk; it does not delete the risk surface or vendor diligence. The OWASP LLM Top 10 and NIST AI RMF responsibilities stay with you either way, and Gartner judges only about 130 of thousands of agentic vendors real.
Build vs buy AI is a layering decision, not a binary
Posing build vs buy AI as a single yes-or-no is the first mistake. The market evidence points in two directions at once, and the honest answer is the synthesis: buy the commodity layer (the foundation model, the infrastructure, the generic horizontal capability) and build the differentiated layer (your product surface, your data loop, your orchestration, your workflow). The real question is not whether to build or buy, but which layer goes in which column.
Two facts have to be held together. On the buy side, internal-build AI projects fail far more often than purchased ones, and enterprises have visibly shifted toward buying off-the-shelf apps as the ecosystem matured.1 On the build side, off-the-shelf agents and horizontal copilots "rarely unlock strategic advantage," and the organizations capturing the most value show a strong preference for customized, bespoke solutions tied to their own processes and data.2 These are not contradictory. They resolve into one rule: buy where the capability is generic, build where it is yours.
There is hype to strip out before any of this is useful. Vendor "agent washing" inflates the buy option: Gartner estimates only about 130 of the thousands of self-described agentic-AI vendors are real, with much of the rest rebadged chatbots and RPA.4 So buy is not automatically the safe, fast path, and build is not automatically the brave one. This is a genuine engineering decision, with a longer-horizon view in our production-first AI cornerstone, and it deserves a real framework rather than a procurement reflex.
The build vs buy AI decision matrix
Six axes decide it: differentiation, data, control and risk, speed to value, total cost of ownership, and lock-in. For each one, ask what pulls toward building the custom layer and what pulls toward buying the API or off-the-shelf product. The plain decision rule that falls out: build where you have an unfair advantage (proprietary data, a workflow moat, or a hard control or compliance requirement) and the volume to justify owning it, and buy everywhere else, especially the model itself.
| Axis | Favors build (custom) | Favors buy (API / off-the-shelf) |
|---|---|---|
| Differentiation | The capability is the product, or a moat tied to a process rivals cannot copy | The capability is table stakes: a generic copilot, summarization, generic support |
| Data | You hold proprietary, hard-to-replicate data and a workflow that generates more of it | The problem is solvable with public or general knowledge, no proprietary data edge |
| Control & risk | You must own security, residency, auditability, latency and model behavior | The vendor controls, SLAs and roadmap are acceptable to you |
| Speed to value | You can fund a dedicated team and tolerate a longer path to production | You need a working capability now, and the vendor app ships faster |
| TCO | Volume is high enough that vendor per-seat or per-call pricing exceeds owning it | Volume is low or spiky, and you avoid hiring, MLOps, eval and on-call |
| Lock-in | You want portability across models and protection from price or roadmap shifts | You accept dependence in exchange for someone carrying the upgrade burden |
The matrix is not a scorecard you tally to a number. It is a way to find the one or two axes that actually decide your case. A regulated workload may be settled by control and risk alone. A high-volume internal tool may turn entirely on TCO, where the arithmetic belongs in our AI TCO calculator rather than a universal crossover point. The failure evidence in the next section is overwhelmingly about teams that built where they had no edge on any of these axes, which is exactly the case the rule tells you to buy.
Why internal AI builds fail more often than bought ones
Internal builds fail more often because teams build where they have no edge. The largest study of build versus buy success found vendor and partnership efforts reach successful deployment about 67% of the time, while internal builds succeed only about one-third as often. The pattern behind the gap is consistent: a from-scratch build needs proprietary data and deployment infrastructure that teams routinely underestimate.
| Path | Reaches successful deployment |
|---|---|
| Buy via vendor or partnership | about 67% |
| Internal build (derived) | about 22% (one-third as often) |
The wider failure context says where the risk concentrates. RAND found more than 80% of AI projects fail, about twice the rate of non-AI IT work, driven by stakeholder problem-misalignment, poor data, chasing technology over the problem, and inadequate deployment infrastructure.5 Gartner expects at least 30% of generative-AI projects to be abandoned after proof of concept, and over 40% of agentic-AI projects to be canceled by end of 2027, on cost, unclear value and weak risk controls.6 MIT NANDA reports that about 95% of organizations see no measurable P&L return from their GenAI pilots.1 S&P Global recorded AI-initiative abandonment jumping from 17% in 2024 to 42% in 2025.7 Informatica’s top three obstacles, data quality and readiness at 43%, technical maturity at 43%, and a skills shortage at 35%, all make a from-scratch build harder than estimated.8 None of this says do not build. It says do not build where you have no data edge, no infrastructure, and no clear problem.
Build where you have an unfair advantage and the volume to own it. Buy everything else, especially the model.
Buy the model, build the product around it
The hybrid resolves the two halves. Buy the model because the model layer is commoditizing and getting cheaper fast, so renting per token beats owning weights for almost everyone. Build the orchestration, the product surface and the data flywheel, because that is where the durable advantage lives. Almost every real system is a hybrid, and the skill is drawing the line in the right place for your specific edge.
The case for buying the model is the cost curve. Inference cost for equivalent performance has fallen roughly 10x a year, on the order of 1,000x over three years for GPT-3-class quality, which is why training or owning a frontier model is almost never the build decision.3 The case for building is where the value sits. a16z's enterprise work finds the moat is the orchestration across models and the domain workflow, never the model itself, with apps now combining the orchestration of cutting-edge models, domain-specific interfaces, and the feature surface that is now cheap to build.11 The buildable moat is the orchestration and the data loop, never the weights themselves.
The mature posture is an explicit portfolio rather than a one-time choice. McKinsey names the target architecture an "agentic AI mesh" capable of integrating both custom-built and off-the-shelf agents: off-the-shelf for routine and horizontal work, custom for the high-impact, proprietary processes, composed together.2 Designed for production from day one, that is the same discipline our production-first AI guide argues for, and it is the work our AI application development team does on the custom layer once the line is drawn.
| Layer | Default call | Why |
|---|---|---|
| Foundation model | Buy (rent per token) | Commodity, falling about 10x a year; owning weights rarely pays off |
| Infrastructure / hosting | Buy or rent | Generic, undifferentiated, heavy to operate |
| Orchestration / routing | Build | Multi-model routing is the differentiator and hedges lock-in |
| Product surface / workflow | Build | Domain UI and workflow fit are cheap to build and hard to copy |
| Data loop / flywheel | Build | Proprietary data and the loop that grows it are the moat |
The risks build vs buy advice gets wrong
Four things this decision must not over-claim. The 67% and 22% split is a sampled success-rate observation, never a per-project law. Buying defers the build risk; it does not delete the risk surface. Lock-in is a cost to price, never an automatic disqualifier. And vendor vetting on the buy side is real diligence, not a shortcut. Hold all four or the framework reads as a sales pitch for one column.
- The success-rate split reflects MIT NANDA’s sampled 2025 GenAI pilots and conflates many use cases. It does not mean your build will fail at that rate; treat it as evidence about where risk concentrates rather than as a probability for your project.
- Whether you build or buy, the running system carries the full risk surface. The OWASP LLM Top 10 failure modes, prompt injection, sensitive-information disclosure, supply chain, excessive agency and unbounded consumption, apply to vendor products too.9 The NIST AI RMF governance functions, govern, map, measure and manage, stay with you either way.10 Buying does not outsource accountability.
- Lock-in trades control for maintenance relief. The honest move is to price the switching cost (re-prompting, re-evaluation, integration rework) instead of treating dependence as fatal. Multi-model orchestration reduces single-vendor exposure without eliminating it.
- TCO is scenario-dependent. Buy looks cheap at low or spiky volume and expensive at scale; build is the reverse, plus a large fixed team cost. Any TCO claim depends on volume and time horizon, so model it rather than assume a universal crossover.
The buy-side blind spot is worth stating once more. With Gartner judging only about 130 of thousands of agentic vendors real, buying can mean buying agent-washed RPA, so the buy column is not the low-diligence option it looks like.4
How to decide for your case
Run your workload through the six axes and find the one or two that decide it. Start by assuming you buy the model and the generic capability, then justify each thing you choose to build by a specific edge, a data moat, a workflow rivals cannot copy, or a hard control requirement. If you cannot name the edge, that part should be bought.
In practice the sequence is short. Separate the layers first, model the TCO at your real volume and time horizon, and pressure-test every build candidate against the differentiation and data axes. Then design the hybrid for production rather than as two disconnected procurement tracks. Drawing that line is precisely what our AI consulting engagements open with, and building the custom layer once it is drawn is what our AI application development team takes on. The goal is one coherent system where the bought commodity and the built moat fit together, instead of a binary you regret in either direction.
Build vs buy AI questions
Should I build or buy AI?
Is it cheaper to build or buy AI?
Why do so many internal AI builds fail?
What does buy the model, build the product mean?
Does buying AI mean I avoid the risk?
Sources
- MIT Project NANDA, The GenAI Divide: State of AI in Business 2025 (2025). Buy via vendor or partnership succeeds about 67% of the time versus one-third as often for internal builds; about 95% of organizations see no measurable P&L return.
- McKinsey QuantumBlack, Seizing the agentic AI advantage (2025). Off-the-shelf agents "rarely unlock strategic advantage"; the "agentic AI mesh" integrates custom-built and off-the-shelf agents.
- Guido Appenzeller / a16z, Welcome to LLMflation (2024). Inference cost for equivalent quality falls roughly 10x a year, on the order of 1,000x over three years for GPT-3-class models.
- a16z (Sarah Wang, Shangda Xu, Justin Kahl, Tugce Erten), How 100 Enterprise CIOs Are Building and Buying Gen AI in 2025 (2025), and Notes on AI Apps in 2026. Differentiation is the orchestration across models plus the domain workflow, never the model itself; model differentiation by use case is the main reason enterprises buy from multiple vendors.
- Gartner, Over 40% of Agentic AI Projects Will Be Canceled by End of 2027 (2025). Flags "agent washing" and estimates only about 130 of thousands of agentic-AI vendors are real.
- Ryseff, De Bruhl and Newberry / RAND Corporation, The Root Causes of Failure for AI Projects and How They Can Succeed (2024). More than 80% of AI projects fail, about twice the non-AI IT rate, with five root causes.
- Gartner, 30% of Generative AI Projects Will Be Abandoned After Proof of Concept by End of 2025 (2024). Abandonment driven by poor data quality, weak risk controls, escalating cost and unclear value.
- S&P Global Market Intelligence, Voice of the Enterprise: AI & Machine Learning (2025). AI-initiative abandonment rose from 17% in 2024 to 42% in 2025; about 46% of proofs of concept scrapped before production.
- Informatica, CDO Insights 2025 (2025). Top obstacles to AI success: data quality and readiness 43%, technical maturity 43%, skills shortage 35%.
- OWASP, Top 10 for LLM Applications 2025 (2024). Failure modes from prompt injection (LLM01) through unbounded consumption (LLM10) apply to built and bought systems alike.
- NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0), NIST AI 100-1 (2023). Four governance functions: govern, map, measure and manage.
Strategy, architecture & ops
AI Architecture Patterns
Agentic design patterns explained: reflection, tool use, planning, and multi-agent collaboration, with a framework to pic...
Read guide →
Strategy, architecture & ops
AI Architecture Patterns for SaaS: A Technical Guide
Generative AI architecture for SaaS: layered design, multi-tenant isolation, LLM gateway, RAG, and security. Built by Res...
Read guide →
Strategy, architecture & ops
AI Cost Optimization
A senior-engineer guide to AI cost optimization: where LLM spend comes from, the levers ranked by payoff, the five number...
Read guide →
Strategy, architecture & ops
AI Deployment Checklist: 9 Gates Before You Ship
How to deploy AI models to production: a 9-gate pre-launch checklist anchored to the OWASP LLM Top 10 (2025), NIST AI RMF...
Read guide →
Strategy, architecture & ops
AI Evaluation and Evals
LLM evaluation and AI evals, explained: the eval taxonomy, how to build an eval suite, LLM-as-a-judge bias, offline vs pr...
Read guide →
Strategy, architecture & ops
AI Features SaaS Customers Actually Want
What AI powered SaaS customers actually want: the time-savers and answers they value, the automation they distrust, and h...
Read guide →
Agents & RAG
Agentic RAG: When to Use It and How to Build It
Agentic RAG explained: how it differs from naive and advanced RAG, the key patterns like corrective RAG and self-RAG, the...
Read guide →
Agents & RAG
AI Agent for Fintech: Risk, Compliance, Ops, Customer
AI agents in finance: fraud, AML, KYC and servicing use cases, how to build with money-movement guardrails and human appr...
Read guide →
Agents & RAG
AI Agent for Healthcare: Use Cases, Governance & Implementation
AI agents in healthcare: the use cases that pay off first, how to build one HIPAA-safe on FHIR with clinician review, and...
Read guide →
