How much does AI cost? An AI TCO calculator for the real price of owning a feature
How much does AI cost? Not the per-token price you screenshot, which is the smallest line on the bill, but the fully loaded yearly cost of owning an AI feature in production. This AI TCO calculator models all of it, well beyond one API call: inference, standing infrastructure, the one-time build, and the ongoing engineering everyone forgets. Edit the numbers, compare build against buy, and read the breakeven.

The short version
- AI TCO is the fully loaded yearly cost of owning an AI feature, well past the price of one API call. It has five parts: inference, standing infrastructure, one-time build, ongoing engineering, and governance.
- The token price is the smallest line. Output is the expensive half (about 5x input on Claude Sonnet 4.6), and the costs that do not fall with token price (build labor, FTE upkeep, infra floors) dominate the bill.
- In the worked default below, a build-it feature runs about $115k in year one and roughly $59k a year at steady state once the one-time build drops off.
- Per-token cost falls roughly 10x a year (a16z LLMflation), so model an editable rate. Cheaper tokens invite more usage, which is why total spend keeps climbing.
- Cost is a named project killer: RAND found over 80% of AI projects fail, and Gartner flags escalating cost as a driver behind abandoned GenAI and agentic projects. A TCO with no cap is a budget with no floor.
How much does AI cost? What goes into AI TCO
How much does AI cost in practice? The honest answer is its total cost of ownership, not a per-token rate. AI TCO is the fully loaded yearly cost of building and running an AI feature in production, well beyond the sticker price of one API call. It has five parts. Inference is input tokens plus output tokens, multiplied by volume and by every call one user action triggers. Infrastructure is the standing bill (vector database, hosting, observability). Build is the one-time engineering to ship it. Ongoing engineering is the upkeep to keep it in production. Governance, the evaluation and monitoring that standards bodies now treat as obligations, lives inside that upkeep line.
The reason a single per-token figure misleads is that it ignores three of those five parts entirely, and it under-prices a fourth. Output tokens are the expensive half: on Claude Sonnet 4.6 the rate is $3 per million input against $15 output, a 5:1 ratio.1 One user action can trigger several calls once you add retries and agent turns. And the costs that genuinely matter over a feature's life, the build, the standing infra floor, and the engineer-hours to maintain it, do not appear on any pricing page.
Two facts justify modeling all five rather than estimating tokens. Most AI projects never reach production: RAND found over 80% of AI projects fail, roughly twice the rate of non-AI IT projects.4 And cost is a named cause of death: Gartner predicts at least 30% of generative AI projects are abandoned after proof of concept, and over 40% of agentic AI projects canceled by end of 2027, citing escalating cost in both.5 A feature that pencils out on tokens but blows up on infra and maintenance is exactly the one that gets killed. This is the cost model our AI deployment team builds before code, and it is one of the five numbers that gate a launch in our production-first method.
The calculator
Enter your numbers and the AI TCO calculator recomputes live: monthly inference cost, annual run cost, build-path year-one TCO, buy-path year-one TCO, and the breakeven month. The defaults below are illustrative example values rather than benchmarks. Token rates, the base wage, and the vector-database floor are externally sourced (see the footnote); build effort, maintenance percentage, request volume, and vendor prices are placeholders you must replace with your own.
Monthly inference cost-
Annual run cost (inference + infra)-
Build path, year-1 TCO-
Build path, year-2+ steady state / yr-
Buy path, year-1 TCO-
Build vs buy breakeven-
How the math works
Six formulas drive the whole model. Price input and output separately, because output is the expensive half and a single blended rate misleads. Then add the standing infra floor, the one-time build, and the ongoing engineering, and compare that against a buy path over the same horizon.
Cost per request is calls x (input tokens x input rate + output tokens x output rate), with rates in dollars per token. On the Sonnet 4.6 default (3,000 in, 700 out, one call) that is 3000 x $3/1M plus 700 x $15/1M, which is $0.009 plus $0.0105, or about $0.0195 a request.1 Monthly inference is that figure times request volume: at 100,000 requests, about $1,950 a month. Annual run cost adds the infra floor and annualizes: ($1,950 plus $200) x 12, about $25,800 a year. Build-path year-one TCO is the one-time build plus the annual run plus the ongoing engineering: $56,000 plus $25,800 plus $33,600, about $115,400. Year two onward drops the build, leaving about $59,400 a year.
The buy path is the vendor fee over the same horizon. Per seat it is seat price x seats x 12; per resolution it is the per-call price x volume, which is the line that explodes at scale, since 1.2M resolutions a year at $1 each is $1.2M before setup. Breakeven solves for the month where cumulative build cost drops below cumulative buy cost: (build_once minus buy_setup) / (buy_monthly minus build_monthly_run). When the denominator is zero or negative, buy stays cheaper and there is no crossover within the modeled horizon. Always read year one and year-two steady state side by side, because the one-time build distorts any single-year comparison; that distortion is the most common TCO mistake.
Reading build vs buy
Model both paths as TCO over the same horizon. Build carries a large one-time cost and a low monthly run-rate; buy carries little setup but a recurring fee that, at volume, can exceed build's steady state. Per-resolution pricing in particular scales straight up with usage. Compare year one against year-two steady state, find the breakeven month, and decide on the durable number rather than the first-year sticker.
| Cost line | Build path | Buy path (per seat) |
|---|---|---|
| One-time (build or vendor setup) | $56,000 | $10,000 |
| Inference / yr | $23,400 | included |
| Infrastructure / yr | $2,400 | included |
| Ongoing engineering / yr | $33,600 | vendor-run |
| Recurring vendor fee / yr | n/a | $18,000 |
| Year-1 TCO | $115,400 | $28,000 |
| Year-2+ steady state / yr | $59,400 | $18,000 |
The table makes the trap visible. In year one, buy looks far cheaper ($28k against $115k) because build front-loads a one-time cost. But at steady state the gap narrows and depends entirely on volume: the per-seat buy fee is fixed, while a per-resolution fee scales straight up with usage and can pass build's run-rate quickly. The right comparison is the steady-state line plus a breakeven month, and the real swing factor is volume, which is why the calculator lets you vary it. This is the number that the build vs buy decision turns on.
Why token price is not your cost
Per-token cost for an equivalent-performance model falls roughly 10x a year, a16z's LLMflation. So architecting your cost model around this month's price is a trap. The parts of TCO that do not fall, the build labor, the FTE upkeep, and the infra floor, increasingly dominate the bill. A TCO model stays useful precisely because it isolates the durable costs from the one collapsing line.
| Date | $ / 1M tokens |
|---|---|
| Nov 2021 (GPT-3 class) | $60.00 |
| Late 2024 (cheapest equivalent) | $0.06 |
| Trend | about 10x cheaper / yr |
Two consequences follow. Make the rate field editable and re-run it, because the inference line you compute today over-states next year's. And remember that a falling unit price does not mean a falling bill: cheaper tokens invite more usage and more agent calls, so request volume and calls per request, not the headline rate, are the swing factors. Pulling those levers down without dropping below your quality bar is the subject of our AI cost optimization guide.
Limitations and cost risk
This calculator is a model, and its output is only as good as your inputs. Most of the defaults are illustrative placeholders, the token line trends down while labor and infra do not, vector-database cost rises faster than linearly at scale, and an uncapped feature carries a real "denial of wallet" risk. Treat the result as a starting frame, then replace every placeholder with your own measured numbers.
- Defaults are not benchmarks. Only the token rates, the base wage, and the infra floor are externally sourced and dated. Build effort, FTE percentage, vendor prices, and request volume are placeholders. Replace them before you trust the output.
- The token line trends down, labor and infra do not. LLMflation means a build TCO computed today over-states year-two inference.3 Apply an annual decline assumption, or at least caveat the inference line.
- Infra cost is superlinear and bursty. A vector-database minimum sits near $50/mo before any usage, but production bills scale with vectors and concurrent load.2 Re-run the model at your real corpus size.
- "Denial of wallet" is a budget risk, not only a security one. OWASP lists Unbounded Consumption (LLM10:2025): uncontrolled inference, retries and abuse can generate "unsustainable financial costs."6 Pair the calculator with rate limits and budget guards.
- Governance is part of ownership cost. NIST's voluntary AI Risk Management Framework and its Generative AI Profile treat evaluation, monitoring and incident response as ongoing risk-management practices.7 Those hours live in the maintenance line, which is why omitting ongoing engineering understates TCO.
- Quality is the missing axis. TCO says nothing about whether the cheapest model clears your eval bar. The right model is the cheapest one that passes, addressed in AI cost optimization and the build vs buy guide.
AI TCO questions
What is the total cost of ownership (TCO) of an AI feature?
How do you calculate the cost of an LLM feature per request?
Should I build or buy an AI feature?
Why is my AI feature more expensive than the token price suggested?
Will falling AI prices make my cost model obsolete?
Sources
- Anthropic, Claude API Pricing (2026). Per-token rates ($3 input / $15 output on Sonnet 4.6) and the 5:1 output ratio.
- Pinecone, Pricing (2026). Standard tier $50/mo floor and per-vector storage; the basis for the infrastructure default and the superlinear-at-scale caveat.
- Guido Appenzeller / a16z, Welcome to LLMflation (2024). Roughly 10x per year; $60.00 to $0.06 per million tokens for a GPT-3-class model in three years.
- Ryseff, De Bruhl and Newberry, The Root Causes of Failure for Artificial Intelligence Projects, RAND Corporation (2024). Over 80% of AI projects fail, about twice the rate of non-AI IT projects.
- Gartner, 30% of generative AI projects abandoned after proof of concept (2024); and over 40% of agentic AI projects canceled by end of 2027 (2025). Escalating cost named in both.
- OWASP GenAI Security Project, LLM10:2025 Unbounded Consumption (2025). Uncontrolled inference can generate unsustainable financial costs, the denial-of-wallet risk.
- NIST, AI Risk Management Framework (AI RMF 1.0) and the Generative AI Profile (NIST AI 600-1, 2024). Evaluation, monitoring and incident response as ongoing risk-management practices inside the maintenance line.
- U.S. Bureau of Labor Statistics, Occupational Outlook Handbook: Software Developers, QA Analysts, and Testers (median wage $133,080, May 2024). Base for the blended engineer rate, grossed up about 1.25x for loaded cost (the gross-up is standard practice and is not a BLS figure).
Strategy, architecture & ops
AI Architecture Patterns
Agentic design patterns explained: reflection, tool use, planning, and multi-agent collaboration, with a framework to pic...
Read guide →
Strategy, architecture & ops
AI Architecture Patterns for SaaS: A Technical Guide
Generative AI architecture for SaaS: layered design, multi-tenant isolation, LLM gateway, RAG, and security. Built by Res...
Read guide →
Strategy, architecture & ops
AI Cost Optimization
A senior-engineer guide to AI cost optimization: where LLM spend comes from, the levers ranked by payoff, the five number...
Read guide →
Strategy, architecture & ops
AI Deployment Checklist: 9 Gates Before You Ship
How to deploy AI models to production: a 9-gate pre-launch checklist anchored to the OWASP LLM Top 10 (2025), NIST AI RMF...
Read guide →
Strategy, architecture & ops
AI Evaluation and Evals
LLM evaluation and AI evals, explained: the eval taxonomy, how to build an eval suite, LLM-as-a-judge bias, offline vs pr...
Read guide →
Strategy, architecture & ops
AI Features SaaS Customers Actually Want
What AI powered SaaS customers actually want: the time-savers and answers they value, the automation they distrust, and h...
Read guide →
Agents & RAG
Agentic RAG: When to Use It and How to Build It
Agentic RAG explained: how it differs from naive and advanced RAG, the key patterns like corrective RAG and self-RAG, the...
Read guide →
Agents & RAG
AI Agent for Fintech: Risk, Compliance, Ops, Customer
AI agents in finance: fraud, AML, KYC and servicing use cases, how to build with money-movement guardrails and human appr...
Read guide →
Agents & RAG
AI Agent for Healthcare: Use Cases, Governance & Implementation
AI agents in healthcare: the use cases that pay off first, how to build one HIPAA-safe on FHIR with clinician review, and...
Read guide →
