Why can a cheaper-per-token AI model cost more per task?

Because you do not pay per question, you pay per token, and each model burns a different amount to solve the same thing. A model that is cheap per token can spend thousands of reasoning tokens or take dozens of steps where another finishes in a few. The list price measures the input, the bill measures the behavior, and they rarely match.

What is the price reversal phenomenon in AI models?

It is when the model that is cheaper per token turns out more expensive per finished job. A study from Stanford, Berkeley, CMU and Microsoft measured it across eight reasoning models and twelve tasks: in nearly a third of the matchups the lower-list-price model cost more to run, up to 28 times more in the worst case.

How do you actually pick the cheapest AI model for a task?

By measuring cost per finished task in your own workflow, not the per-token price on the table. You run each candidate model over your real tasks, measure how many tokens and steps it consumes to completion, and route each type of work to the model that solves it cheapest end to end. The list rate is only the starting point.

Why is charging a flat AI fee on a variable per-token cost risky?

Because token consumption is variable and partly random: the same model on the same query can vary in cost by up to 9.7x between runs. If you charge a flat fee on top of that, your heaviest users quietly go unprofitable, and you hand your margin to a random number generator instead of pricing the real cost.

www.iqsource.ai

AI price per token lies. Measure cost per job.

Ricardo Argüello

AI price per token lies. Measure cost per job.

Ricardo Argüello — June 21, 2026

Ricardo Argüello

CEO & Founder

June 21, 2026 Business Strategy 5 min read

The price per token is a marketing number. The bill is a behavior number. And they are rarely in the same order.

That is the thesis of this post, and it has a direct consequence for anyone building on AI or budgeting its spend: picking a model by the price on the table is picking by the wrong number. The one that is cheaper per token can cost you more per finished job, sometimes by a lot. The competence that matters is not finding the cheapest model in the list, it is measuring what each task actually costs and routing the work to the right model. That is what we build when we build on models, and the rest of this post explains why.

The number that lies, with data

Serge Herkül, who advises SaaS companies on pricing, laid it out with a case that stings: Gemini 3 Flash is listed 80% cheaper than GPT-5.4. Run across twelve real tasks, it costs 38% more.

It is not a fluke. Herkül cites a study from Stanford, Berkeley, CMU and Microsoft, titled “The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More,” that ran eight reasoning models across twelve tasks and compared the list price with the actual bill. In nearly a third of the matchups, the “cheaper” model cost more. In the worst case, 28 times more.

The details explain it. One model spent 60,000 reasoning tokens on a problem another solved in 25. On an agent task, one took 57 steps where another took 7. And the part that hurts most when you are trying to budget: the same model, on the same query, varied in cost by up to 9.7x between runs.

Strip the AI out of it and you are left with a pricing lesson as old as commerce: unit price is not total cost.

Why the cheap one runs expensive

The mechanics are simple once you see them. You do not pay per question. You pay per token. And every model burns a different number of tokens to reach the same answer.

A model with a low list price can be a model that overthinks. It reasons out loud for thousands of tokens before answering, or it spirals into extra steps when acting as an agent, or it rereads the same context again and again. Every one of those tokens costs money, even if each individual token is cheap. The expensive-per-token model sometimes cuts straight to the point, spends a fraction of the tokens, and ends up cheaper per job.

Then there is the variance. A model that swings nearly tenfold in cost on the same query between two runs means you cannot even assume a stable average. Cost per task is not a point, it is a distribution, and the tail of that distribution is where the money goes.

What this breaks in your business

If you build a product on LLMs, this hits you in two ways, and it pays to see both clearly.

The first is your cost to operate. Your cost of goods sold is not the list price. It is the list price times consumption, and consumption is variable, model-specific and partly random. If you modeled your margin on the number from the table, you modeled the wrong number. I wrote about this from another angle in the post on the hidden cost lever in enterprise AI: timing, where batching, caching and scheduling move the bill as much as the model does.

The second is how you charge. If you put a flat fee on top of a variable cost, your heaviest users go underwater without you noticing. You handed your margin to a random number generator. This connects straight to something I already argued: in AI, you are what you charge for. Charging for the outcome only works if you know what producing that outcome costs you, and this is the part almost nobody measures.

And no, capping the spend does not fix it. I covered that in a $1,500 cap does not cure your AI bill: the cap treats the symptom. The cause is not knowing which task runs on which model at what real cost.

What IQ Source does about it

The way out is not to pick the cheapest model or the most expensive one. It is to stop choosing by the price table and start choosing by cost per finished task in your own workflow.

That demands a discipline almost nobody has set up. You have to run each candidate model over your real tasks, not over a generic benchmark, measure how many tokens and how many steps it consumes to completion, look at the tail of the distribution and not just the average, and route each type of work to the model that solves it cheapest end to end. Sometimes the expensive frontier model is the most economical for the hard task, and an efficient model is enough for the routine one. The only way to know is to measure it in your context.

At IQ Source, that is part of what we build when a company puts us to work standing up AI on top of their operation. We do not hand over “use this model.” We hand over a routing table built on your tasks, with cost per job measured, not estimated. It is the difference between buying by the label and buying by the bill.

The next time someone on your team proposes switching models “because it is cheaper,” ask one concrete question: cheaper per token, or cheaper per finished task? If the answer is “per token,” you still do not know what it will cost. You will find out on the bill, which is the only number you actually pay.

Measure your AI cost per task, not per token

Frequently Asked Questions

AI costs AI economics model selection model routing AI pricing AI Maestro AI strategy

Your most certain expert blocks AI adoption

Business Strategy

June 20, 2026 · 7 min read

Your most certain expert blocks AI adoption

Altman said the most credible scientists held AI back through certainty. The same thing happens in your company: the surest person is often the biggest brake.

AI adoption Sam Altman change management

In AI, You Are What You Charge For, Not What You Install

Business Strategy

June 19, 2026 · 5 min read

In AI, You Are What You Charge For, Not What You Install

Joe Pine puts it bluntly: you are what you charge for. Charge for the tool and you're in the tool business. Charging for the outcome forces the change to actually happen.

transformation economy Joe Pine business model

AI price per token lies. Measure cost per job.

AI price per token lies. Measure cost per job.

General summary

The number that lies, with data

Why the cheap one runs expensive

What this breaks in your business

What IQ Source does about it

Frequently Asked Questions

Related Articles

Your most certain expert blocks AI adoption

In AI, You Are What You Charge For, Not What You Install

IQ Source Assistant