Skip to main content

AI price per token lies. Measure cost per job.

Gemini 3 Flash is listed 80% cheaper than GPT-5.4 and costs 38% more to run. The list price is marketing. The bill depends on how many tokens each model burns.

AI price per token lies. Measure cost per job.

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Business Strategy 5 min read

The price per token is a marketing number. The bill is a behavior number. And they are rarely in the same order.

That is the thesis of this post, and it has a direct consequence for anyone building on AI or budgeting its spend: picking a model by the price on the table is picking by the wrong number. The one that is cheaper per token can cost you more per finished job, sometimes by a lot. The competence that matters is not finding the cheapest model in the list, it is measuring what each task actually costs and routing the work to the right model. That is what we build when we build on models, and the rest of this post explains why.

The number that lies, with data

Serge Herkül, who advises SaaS companies on pricing, laid it out with a case that stings: Gemini 3 Flash is listed 80% cheaper than GPT-5.4. Run across twelve real tasks, it costs 38% more.

It is not a fluke. Herkül cites a study from Stanford, Berkeley, CMU and Microsoft, titled “The Price Reversal Phenomenon: When Cheaper Reasoning Models Cost More,” that ran eight reasoning models across twelve tasks and compared the list price with the actual bill. In nearly a third of the matchups, the “cheaper” model cost more. In the worst case, 28 times more.

The details explain it. One model spent 60,000 reasoning tokens on a problem another solved in 25. On an agent task, one took 57 steps where another took 7. And the part that hurts most when you are trying to budget: the same model, on the same query, varied in cost by up to 9.7x between runs.

Strip the AI out of it and you are left with a pricing lesson as old as commerce: unit price is not total cost.

Why the cheap one runs expensive

The mechanics are simple once you see them. You do not pay per question. You pay per token. And every model burns a different number of tokens to reach the same answer.

A model with a low list price can be a model that overthinks. It reasons out loud for thousands of tokens before answering, or it spirals into extra steps when acting as an agent, or it rereads the same context again and again. Every one of those tokens costs money, even if each individual token is cheap. The expensive-per-token model sometimes cuts straight to the point, spends a fraction of the tokens, and ends up cheaper per job.

Then there is the variance. A model that swings nearly tenfold in cost on the same query between two runs means you cannot even assume a stable average. Cost per task is not a point, it is a distribution, and the tail of that distribution is where the money goes.

What this breaks in your business

If you build a product on LLMs, this hits you in two ways, and it pays to see both clearly.

The first is your cost to operate. Your cost of goods sold is not the list price. It is the list price times consumption, and consumption is variable, model-specific and partly random. If you modeled your margin on the number from the table, you modeled the wrong number. I wrote about this from another angle in the post on the hidden cost lever in enterprise AI: timing, where batching, caching and scheduling move the bill as much as the model does.

The second is how you charge. If you put a flat fee on top of a variable cost, your heaviest users go underwater without you noticing. You handed your margin to a random number generator. This connects straight to something I already argued: in AI, you are what you charge for. Charging for the outcome only works if you know what producing that outcome costs you, and this is the part almost nobody measures.

And no, capping the spend does not fix it. I covered that in a $1,500 cap does not cure your AI bill: the cap treats the symptom. The cause is not knowing which task runs on which model at what real cost.

What IQ Source does about it

The way out is not to pick the cheapest model or the most expensive one. It is to stop choosing by the price table and start choosing by cost per finished task in your own workflow.

That demands a discipline almost nobody has set up. You have to run each candidate model over your real tasks, not over a generic benchmark, measure how many tokens and how many steps it consumes to completion, look at the tail of the distribution and not just the average, and route each type of work to the model that solves it cheapest end to end. Sometimes the expensive frontier model is the most economical for the hard task, and an efficient model is enough for the routine one. The only way to know is to measure it in your context.

At IQ Source, that is part of what we build when a company puts us to work standing up AI on top of their operation. We do not hand over “use this model.” We hand over a routing table built on your tasks, with cost per job measured, not estimated. It is the difference between buying by the label and buying by the bill.

The next time someone on your team proposes switching models “because it is cheaper,” ask one concrete question: cheaper per token, or cheaper per finished task? If the answer is “per token,” you still do not know what it will cost. You will find out on the bill, which is the only number you actually pay.

Measure your AI cost per task, not per token

Frequently Asked Questions

AI costs AI economics model selection model routing AI pricing AI Maestro AI strategy

Related Articles

Your most certain expert blocks AI adoption
Business Strategy
· 7 min read

Your most certain expert blocks AI adoption

Altman said the most credible scientists held AI back through certainty. The same thing happens in your company: the surest person is often the biggest brake.

AI adoption Sam Altman change management
In AI, You Are What You Charge For, Not What You Install
Business Strategy
· 5 min read

In AI, You Are What You Charge For, Not What You Install

Joe Pine puts it bluntly: you are what you charge for. Charge for the tool and you're in the tool business. Charging for the outcome forces the change to actually happen.

transformation economy Joe Pine business model