SaaS pricing models for AI: why inference breaks flat-rate margins and how to price it

The SaaS pricing models that work for traditional software break down the moment you add AI features. Classic SaaS made one copy and sold it a million times at near-zero marginal cost. AI inference does not work that way: every model call burns real GPU compute, so a flat per-seat fee quietly bleeds margin on the users who lean on the feature most. This guide covers the four SaaS pricing models for AI, why margins compress, what drives the cost, and a floor-and-ceiling method for setting a price that holds.

By Kanika Mathur, Head of Service Delivery

Reviewed by Resourcifi engineeringPublished Jan 23, 2026Updated Jan 23, 202612 min read

SaaS

Key takeaways

The short version

AI breaks the economics of flat-rate SaaS. Every model call re-runs inference and burns real compute, so AI-native gross margins run roughly 50% to 60% per a16z, against the 70% to 90% of classic software. A flat per-seat fee collects the same revenue whether a user makes 5 calls or 5,000.
The cost is concentrated and measurable. ICONIQ Growth data points to roughly $230,000 of inference cost per $1M of AI revenue (about a 23% inference-to-revenue ratio), and Kyle Poyar found the top 5% of users drove about 75% of usage cost under flat pricing.
You can pull the cost down hard. Prompt caching cuts cached input tokens by about 90% (Anthropic and OpenAI), and model routing has cut compute roughly 70% at quality parity per Red Hat. Token cost is also falling about 10x per year (a16z LLMflation), so a thin margin today widens over time.
Pure per-seat pricing is the one model the experts agree is dangerous for AI, because seat count is a poor proxy for value and recovers none of the variable cost. The market is moving to usage, credits, and hybrid.
The default that works is hybrid: a platform fee plus metered credits. Kyle Poyar calls "platform fee plus credits" where most of the smart money is going, and Salesforce Agentforce, Intercom, and ServiceNow all ship hybrid components.

Why AI features break flat-rate SaaS margins

AI features compress SaaS gross margins because every model call has a real variable cost, where classic software had almost none. Once traditional SaaS exists, one more user costs close to nothing to serve, which is why B2B SaaS gross margins sat at 70% to 90%. AI breaks that: each transaction re-runs inference and consumes GPU compute, so AI-native gross margins run roughly 50% to 60% per a16z.¹ A flat per-seat subscription collects the same revenue whether a user makes 5 calls or 5,000, so the heaviest users silently erode margin.

The a16z framing is the canonical anchor: there was "a kind of business gravity that pulled all SaaS toward 70 to 80 percent gross margins," and AI breaks that gravity because every transaction carries a cost.¹ The data corroborates it from several angles. Bessemer's State of AI 2025 puts LLM-native company gross margins around 65%.² ICONIQ Growth data points to roughly $230,000 of inference cost leaving for every $1M of AI revenue, an inference-to-revenue ratio near 23%.³ And the cost is concentrated: Kyle Poyar found that under flat-fee AI pricing the top 5% of users drove about 75% of usage cost while representing only about 5% of revenue, all of it unprofitable.⁴

AI gross margins sit well below classic SaaS

Typical gross margin by business type. The gap is the inference cost that a flat per-seat fee never recovers. Mixed sources, so read the bars as directional ranges rather than one firm's series.

Data behind this chart
Business type	Typical gross margin	Source
Traditional B2B SaaS	70% to 90% (shown ~80%)	a16z and multiple
LLM-native company	~65%	Bessemer, State of AI 2025
AI-native SaaS	50% to 60% (shown ~55%)	a16z, The New Business of AI
Inference cost per $1 of AI revenue	~23% (about $230K per $1M)	ICONIQ Growth (2026)

Sources: a16z, The New Business of AI; Bessemer State of AI 2025; ICONIQ Growth (2026). Figures come from different firms and years, so treat them as directional.

What drives AI cost, and how to control it

AI COGS is driven mostly by inference volume, which is input plus output tokens multiplied by calls per user, then by model choice, RAG retrieval infrastructure, context bloat, and agentic loops that fan out into many calls per action. The good news is each lever is also a control: prompt caching cuts cached input tokens by about 90%, model routing has cut compute roughly 70% at quality parity, and response caching commonly removes 30% to 50% of token spend.

Five things move the bill. Token volume is the single biggest lever. Model choice is the most common leak, because routing every query to a frontier model costs multiples of what a small model would. RAG adds vector hosting, embedding generation, and re-embedding on data changes. Context bloat from long system prompts and large retrieved context inflates input tokens on every call. And agentic loops multiply calls per user action in ways that are hard to predict. The controls map onto each: prompt caching bills cached reads at roughly 10% of the standard input rate, per Anthropic and OpenAI; model routing cut compute about 70% while holding output quality steady in Red Hat's documented enterprise deployments; response and semantic caching typically removes 30% to 50% of token spend; and an AI gateway centralizes routing, caching, and rate limiting.⁵

One tailwind reframes the whole exercise. Per a16z, the cost of an LLM of equivalent performance is falling about 10x per year, a trend Guido Appenzeller calls LLMflation.¹ So model COGS is a depreciating input: a margin that looks thin today tends to widen on its own, and pricing purely around current token cost is a trap. Building the cost-controlled inference architecture behind all of this, the routing, caching, and RAG retrieval, is the work our AI application development team does.

The four SaaS pricing models for AI features

There are four pricing models for AI features: a flat add-on, usage or credits, hybrid, and outcome-based. Flat is simplest to sell but decouples price from cost. Usage aligns revenue with COGS but feels unpredictable to buyers. Hybrid (a platform fee plus credits) gives a revenue floor and margin protection at once, and is the default the smart money is choosing. Outcome-based charges only on a delivered result, which maximizes willingness to pay but is hard to attribute.

The comparison below is the heart of the decision. Each row trades predictability against margin protection differently, and the named vendors show the pattern is already live in the market.

AI pricing models compared

Four structures, read against two questions: does price track the inference cost, and does the buyer find it predictable. Vendor examples are real, public pricing.

How the four AI pricing models compare
Model	How it works	The trade-off	Who uses it
Flat add-on	One fixed price; AI bundled or sold as a flat upcharge on the seat	Predictable and familiar to sell, but price is decoupled from cost so heavy users destroy margin	Legacy SaaS bolting AI onto seat plans (the early Copilot model)
Usage / credits	Charge per token, API call, action, or a prepaid credit pool	Aligns revenue with COGS and scales with value, but revenue is less predictable and needs metering and budget caps	Most consumption AI tools; the OpenAI and Anthropic APIs
Hybrid	A fixed platform fee for predictability plus metered credits or usage on top	A revenue floor and margin protection together; the cost is explaining and packaging it well	Salesforce Agentforce, Intercom, ServiceNow Now Assist
Outcome-based	Charge only when the AI delivers a defined result	Maximizes willingness to pay and buyer trust, but attribution and revenue forecasting are hard	Intercom Fin ($0.99 per resolution), Chargeflow (25% of recovered funds)

Sources: Kyle Poyar / Schematic, The State of AI Pricing (2025); Intercom Fin and Salesforce Agentforce public pricing; SaaStr on Agentforce pricing.

Two vendor anchors are worth keeping concrete. Intercom Fin charges $0.99 per resolution with no platform fee, billing only when the AI actually handles the issue.⁶ Salesforce Agentforce started at $2 per conversation and by early 2026 moved to a hybrid of usage credits ($500 per 100,000 Flex Credits) plus per-user licensing.⁷ The large majority of AI software companies now run a mixed or hybrid structure, so the table is less a menu of equals than a map of where the market has already landed.

The shift from per-seat to usage and outcome

The industry is moving off pure per-seat pricing for AI because seat count is a poor proxy for value and recovers none of the variable cost. An AI agent does the work a seat used to do, so value can rise while fewer humans log in, which means seat-based pricing caps revenue exactly as AI raises value while also failing to cover compute. Both ends break, which is why usage, credits, and hybrid are taking over.

Kyle Poyar's data captures the speed of it: hybrid models are surging, and his line is that "hybrid is where most of the smart money is going," with the most common structure being a platform fee plus credits.⁴ Incumbents are publicly making the move, with Atlassian and HubSpot among those shifting AI off flat pricing toward usage and outcome components, per The Information.⁸ The 2026 wrapper for all this is credit systems, which abstract raw tokens into a familiar unit buyers can budget against. The net narrative is not that subscriptions are dead; it is that usage and outcome get layered on top of a base fee so price tracks both delivered value and protected cost. For SaaS buyers weighing where this lands inside a product, our SaaS engineering work is where the pricing model meets the build.

Setting a price that protects margin

Set an AI feature price by bracketing it between a cost-plus floor and a value-based ceiling, then choosing a hybrid structure in between. The floor is your fully loaded per-unit cost at a target gross margin, the minimum that keeps the feature profitable. The ceiling is a fraction of the measurable outcome the feature delivers. The price lives between them: a platform fee that covers fixed cost plus metered credits that recover variable inference cost from heavy users.

Work the floor first. Add up the fully loaded cost per unit, the LLM input and output tokens, embeddings, vector search, orchestration calls, plus monitoring and overhead, then apply a target gross margin (say 70% to 80%) to get the minimum price. Treat that as a guardrail, never the answer: Simon-Kucher and Monetizely both warn that cost-plus alone is a trap for AI, because marginal cost says nothing about value and token cost is falling about 10x a year, so a cost-anchored price leaves money on the table and erodes as costs drop.⁹

Then set the ceiling by value. Anchor to the business outcome the feature delivers, hours saved times loaded labor cost, tickets deflected times cost per ticket, or revenue recovered, and capture a fraction of it so the customer keeps clear ROI; a defensible share often sits well under the value created.¹⁰ Between floor and ceiling, pick a hybrid structure, bill in a metric customers understand (credits, resolutions, actions) instead of raw tokens, add caps and alerts to kill meter anxiety, and re-price on a cadence as model COGS falls. Two metrics tell you if it is working: keep the inference-to-revenue ratio well under the roughly 23% ICONIQ benchmark, and watch the cost concentration in your top 5% of users.

Frequently asked

SaaS AI pricing questions

How should I price an AI feature in my SaaS product?

Bracket the price between a cost-plus floor and a value-based ceiling. The floor is your fully loaded per-unit inference cost at a 70 to 80 percent target margin; the ceiling is a share of the measurable outcome the feature delivers. Then use a hybrid structure, a platform fee plus credits, so a base fee covers fixed cost and metered usage recovers variable inference cost from heavy users.

Why do AI features lower SaaS gross margins?

Because every model call has a real variable cost from GPU inference, unlike traditional software whose marginal cost is near zero. AI-native gross margins run roughly 50 to 65 percent against the 70 to 90 percent of classic SaaS, per a16z and Bessemer State of AI 2025, and a flat subscription collects the same revenue regardless of how much compute a user burns.

Should AI pricing be usage-based or seat-based?

Pure seat-based is the riskiest model for AI, because seat count is a poor proxy for value and recovers none of the compute cost. The market is moving to usage, credit, and hybrid models, with hybrid, a platform fee plus credits, now the default smart-money choice per Kyle Poyar. Most products layer usage or outcome on top of a base fee rather than abandoning subscriptions outright.

What is outcome-based pricing for AI and who uses it?

You charge only when the AI delivers a defined result. Intercom Fin charges $0.99 per resolution and Chargeflow takes 25 percent of recovered chargeback funds, both billing only on a delivered outcome. It maximizes willingness to pay and buyer trust, but attribution and revenue forecasting are harder, so it suits features with a clean, measurable result.

How do I reduce AI inference costs?

Use prompt caching, which discounts cached input tokens by about 90 percent, model routing to send simple queries to cheaper models, a documented compute cut near 70 percent at quality parity, and response or semantic caching that removes 30 to 50 percent of token spend. Shorter context, prompt compression, and an AI gateway help further. Remember model costs fall about 10x a year, so margins tend to widen over time.

Kanika Mathur

Head of Service Delivery, Resourcifi

Kanika Mathur runs service delivery at Resourcifi, where her engineering pods build the inference architecture, model routing, caching, and RAG retrieval that sit behind AI features inside SaaS products. She has scoped the per-request cost models and contribution-margin reviews that decide whether an AI feature ships profitable or slowly erodes the unit economics, and that is the lens this guide is written from.

Resourcifi on LinkedIn →