SaaS pricing models for AI: why inference breaks flat-rate margins and how to price it
The SaaS pricing models that work for traditional software break down the moment you add AI features. Classic SaaS made one copy and sold it a million times at near-zero marginal cost. AI inference does not work that way: every model call burns real GPU compute, so a flat per-seat fee quietly bleeds margin on the users who lean on the feature most. This guide covers the four SaaS pricing models for AI, why margins compress, what drives the cost, and a floor-and-ceiling method for setting a price that holds.

The short version
- AI breaks the economics of flat-rate SaaS. Every model call re-runs inference and burns real compute, so AI-native gross margins run roughly 50% to 60% per a16z, against the 70% to 90% of classic software. A flat per-seat fee collects the same revenue whether a user makes 5 calls or 5,000.
- The cost is concentrated and measurable. ICONIQ Growth data points to roughly $230,000 of inference cost per $1M of AI revenue (about a 23% inference-to-revenue ratio), and Kyle Poyar found the top 5% of users drove about 75% of usage cost under flat pricing.
- You can pull the cost down hard. Prompt caching cuts cached input tokens by about 90% (Anthropic and OpenAI), and model routing has cut compute roughly 70% at quality parity per Red Hat. Token cost is also falling about 10x per year (a16z LLMflation), so a thin margin today widens over time.
- Pure per-seat pricing is the one model the experts agree is dangerous for AI, because seat count is a poor proxy for value and recovers none of the variable cost. The market is moving to usage, credits, and hybrid.
- The default that works is hybrid: a platform fee plus metered credits. Kyle Poyar calls "platform fee plus credits" where most of the smart money is going, and Salesforce Agentforce, Intercom, and ServiceNow all ship hybrid components.
Why AI features break flat-rate SaaS margins
AI features compress SaaS gross margins because every model call has a real variable cost, where classic software had almost none. Once traditional SaaS exists, one more user costs close to nothing to serve, which is why B2B SaaS gross margins sat at 70% to 90%. AI breaks that: each transaction re-runs inference and consumes GPU compute, so AI-native gross margins run roughly 50% to 60% per a16z.1 A flat per-seat subscription collects the same revenue whether a user makes 5 calls or 5,000, so the heaviest users silently erode margin.
The a16z framing is the canonical anchor: there was "a kind of business gravity that pulled all SaaS toward 70 to 80 percent gross margins," and AI breaks that gravity because every transaction carries a cost.1 The data corroborates it from several angles. Bessemer's State of AI 2025 puts LLM-native company gross margins around 65%.2 ICONIQ Growth data points to roughly $230,000 of inference cost leaving for every $1M of AI revenue, an inference-to-revenue ratio near 23%.3 And the cost is concentrated: Kyle Poyar found that under flat-fee AI pricing the top 5% of users drove about 75% of usage cost while representing only about 5% of revenue, all of it unprofitable.4
| Business type | Typical gross margin | Source |
|---|---|---|
| Traditional B2B SaaS | 70% to 90% (shown ~80%) | a16z and multiple |
| LLM-native company | ~65% | Bessemer, State of AI 2025 |
| AI-native SaaS | 50% to 60% (shown ~55%) | a16z, The New Business of AI |
| Inference cost per $1 of AI revenue | ~23% (about $230K per $1M) | ICONIQ Growth (2026) |
What drives AI cost, and how to control it
AI COGS is driven mostly by inference volume, which is input plus output tokens multiplied by calls per user, then by model choice, RAG retrieval infrastructure, context bloat, and agentic loops that fan out into many calls per action. The good news is each lever is also a control: prompt caching cuts cached input tokens by about 90%, model routing has cut compute roughly 70% at quality parity, and response caching commonly removes 30% to 50% of token spend.
Five things move the bill. Token volume is the single biggest lever. Model choice is the most common leak, because routing every query to a frontier model costs multiples of what a small model would. RAG adds vector hosting, embedding generation, and re-embedding on data changes. Context bloat from long system prompts and large retrieved context inflates input tokens on every call. And agentic loops multiply calls per user action in ways that are hard to predict. The controls map onto each: prompt caching bills cached reads at roughly 10% of the standard input rate, per Anthropic and OpenAI; model routing cut compute about 70% while holding output quality steady in Red Hat's documented enterprise deployments; response and semantic caching typically removes 30% to 50% of token spend; and an AI gateway centralizes routing, caching, and rate limiting.5
One tailwind reframes the whole exercise. Per a16z, the cost of an LLM of equivalent performance is falling about 10x per year, a trend Guido Appenzeller calls LLMflation.1 So model COGS is a depreciating input: a margin that looks thin today tends to widen on its own, and pricing purely around current token cost is a trap. Building the cost-controlled inference architecture behind all of this, the routing, caching, and RAG retrieval, is the work our AI application development team does.
The four SaaS pricing models for AI features
There are four pricing models for AI features: a flat add-on, usage or credits, hybrid, and outcome-based. Flat is simplest to sell but decouples price from cost. Usage aligns revenue with COGS but feels unpredictable to buyers. Hybrid (a platform fee plus credits) gives a revenue floor and margin protection at once, and is the default the smart money is choosing. Outcome-based charges only on a delivered result, which maximizes willingness to pay but is hard to attribute.
The comparison below is the heart of the decision. Each row trades predictability against margin protection differently, and the named vendors show the pattern is already live in the market.
| Model | How it works | The trade-off | Who uses it |
|---|---|---|---|
| Flat add-on | One fixed price; AI bundled or sold as a flat upcharge on the seat | Predictable and familiar to sell, but price is decoupled from cost so heavy users destroy margin | Legacy SaaS bolting AI onto seat plans (the early Copilot model) |
| Usage / credits | Charge per token, API call, action, or a prepaid credit pool | Aligns revenue with COGS and scales with value, but revenue is less predictable and needs metering and budget caps | Most consumption AI tools; the OpenAI and Anthropic APIs |
| Hybrid | A fixed platform fee for predictability plus metered credits or usage on top | A revenue floor and margin protection together; the cost is explaining and packaging it well | Salesforce Agentforce, Intercom, ServiceNow Now Assist |
| Outcome-based | Charge only when the AI delivers a defined result | Maximizes willingness to pay and buyer trust, but attribution and revenue forecasting are hard | Intercom Fin ($0.99 per resolution), Chargeflow (25% of recovered funds) |
Two vendor anchors are worth keeping concrete. Intercom Fin charges $0.99 per resolution with no platform fee, billing only when the AI actually handles the issue.6 Salesforce Agentforce started at $2 per conversation and by early 2026 moved to a hybrid of usage credits ($500 per 100,000 Flex Credits) plus per-user licensing.7 The large majority of AI software companies now run a mixed or hybrid structure, so the table is less a menu of equals than a map of where the market has already landed.
The shift from per-seat to usage and outcome
The industry is moving off pure per-seat pricing for AI because seat count is a poor proxy for value and recovers none of the variable cost. An AI agent does the work a seat used to do, so value can rise while fewer humans log in, which means seat-based pricing caps revenue exactly as AI raises value while also failing to cover compute. Both ends break, which is why usage, credits, and hybrid are taking over.
Kyle Poyar's data captures the speed of it: hybrid models are surging, and his line is that "hybrid is where most of the smart money is going," with the most common structure being a platform fee plus credits.4 Incumbents are publicly making the move, with Atlassian and HubSpot among those shifting AI off flat pricing toward usage and outcome components, per The Information.8 The 2026 wrapper for all this is credit systems, which abstract raw tokens into a familiar unit buyers can budget against. The net narrative is not that subscriptions are dead; it is that usage and outcome get layered on top of a base fee so price tracks both delivered value and protected cost. For SaaS buyers weighing where this lands inside a product, our SaaS engineering work is where the pricing model meets the build.
Setting a price that protects margin
Set an AI feature price by bracketing it between a cost-plus floor and a value-based ceiling, then choosing a hybrid structure in between. The floor is your fully loaded per-unit cost at a target gross margin, the minimum that keeps the feature profitable. The ceiling is a fraction of the measurable outcome the feature delivers. The price lives between them: a platform fee that covers fixed cost plus metered credits that recover variable inference cost from heavy users.
Work the floor first. Add up the fully loaded cost per unit, the LLM input and output tokens, embeddings, vector search, orchestration calls, plus monitoring and overhead, then apply a target gross margin (say 70% to 80%) to get the minimum price. Treat that as a guardrail, never the answer: Simon-Kucher and Monetizely both warn that cost-plus alone is a trap for AI, because marginal cost says nothing about value and token cost is falling about 10x a year, so a cost-anchored price leaves money on the table and erodes as costs drop.9
Then set the ceiling by value. Anchor to the business outcome the feature delivers, hours saved times loaded labor cost, tickets deflected times cost per ticket, or revenue recovered, and capture a fraction of it so the customer keeps clear ROI; a defensible share often sits well under the value created.10 Between floor and ceiling, pick a hybrid structure, bill in a metric customers understand (credits, resolutions, actions) instead of raw tokens, add caps and alerts to kill meter anxiety, and re-price on a cadence as model COGS falls. Two metrics tell you if it is working: keep the inference-to-revenue ratio well under the roughly 23% ICONIQ benchmark, and watch the cost concentration in your top 5% of users.
SaaS AI pricing questions
How should I price an AI feature in my SaaS product?
Why do AI features lower SaaS gross margins?
Should AI pricing be usage-based or seat-based?
What is outcome-based pricing for AI and who uses it?
How do I reduce AI inference costs?
Sources
- a16z (Martin Casado and Matt Bornstein), The New Business of AI and How It’s Different From Traditional Software (2020); LLMflation attributed to Guido Appenzeller.
- Bessemer Venture Partners, State of AI 2025, gross-margin figure as reported by SaaS Mag, AI COGS and SaaS gross-margin compression (2026).
- ICONIQ Growth (2026) inference-to-revenue data, as reported by SaaS Mag (2026).
- Kyle Poyar, The State of AI Pricing in 2025 (Schematic, 2025).
- Prompt-caching discount per Anthropic and OpenAI pricing; model-routing and caching figures via CloudZero, AI Cost Optimization and Red Hat AI infrastructure, as reported by SaaS Mag (2026).
- Intercom, Fin AI pricing ($0.99 per resolution) (2026).
- SaaStr, Salesforce now has 3 pricing models for Agentforce (2026).
- The Information, Atlassian, HubSpot Join Shift Away From AI Flat Fees (2025, paywalled).
- Simon-Kucher, Price model shifts in the age of AI; and Monetizely, How to Price AI Services in 2025.
- Monetizely, How to Price AI Services in 2025: Models, Examples and Strategy (2025). Value-share rule of thumb is directional.
Strategy, architecture & ops
AI Architecture Patterns
Agentic design patterns explained: reflection, tool use, planning, and multi-agent collaboration, with a framework to pic...
Read guide →
Strategy, architecture & ops
AI Architecture Patterns for SaaS: A Technical Guide
Generative AI architecture for SaaS: layered design, multi-tenant isolation, LLM gateway, RAG, and security. Built by Res...
Read guide →
Strategy, architecture & ops
AI Cost Optimization
A senior-engineer guide to AI cost optimization: where LLM spend comes from, the levers ranked by payoff, the five number...
Read guide →
Strategy, architecture & ops
AI Deployment Checklist: 9 Gates Before You Ship
How to deploy AI models to production: a 9-gate pre-launch checklist anchored to the OWASP LLM Top 10 (2025), NIST AI RMF...
Read guide →
Strategy, architecture & ops
AI Evaluation and Evals
LLM evaluation and AI evals, explained: the eval taxonomy, how to build an eval suite, LLM-as-a-judge bias, offline vs pr...
Read guide →
Strategy, architecture & ops
AI Features SaaS Customers Actually Want
What AI powered SaaS customers actually want: the time-savers and answers they value, the automation they distrust, and h...
Read guide →
Agents & RAG
Agentic RAG: When to Use It and How to Build It
Agentic RAG explained: how it differs from naive and advanced RAG, the key patterns like corrective RAG and self-RAG, the...
Read guide →
Agents & RAG
AI Agent for Fintech: Risk, Compliance, Ops, Customer
AI agents in finance: fraud, AML, KYC and servicing use cases, how to build with money-movement guardrails and human appr...
Read guide →
Agents & RAG
AI Agent for Healthcare: Use Cases, Governance & Implementation
AI agents in healthcare: the use cases that pay off first, how to build one HIPAA-safe on FHIR with clinician review, and...
Read guide →
