A $1,500 cap on AI treats the symptom, not the cause
Ricardo Argüello — June 6, 2026
CEO & Founder
General summary
In a single week, Uber capped AI tool spending at $1,500 per person after burning its entire 2026 budget in four months, and it came out that one company spent over $500M on Claude in thirty days because it set no limits at all. Everyone's reaction is the same: cap it. But the cap treats the symptom. The disease is turning agents loose on work nobody sized: no one defined how much return each process should produce per token spent. A cap cannot tell which tokens were working. The fix is not to budget after the fact. It is to scope before you build.
- Uber burned its full 2026 AI budget in four months and reacted with a $1,500 per-person cap on each tool; before that, individual engineers were generating bills of $500 to $2,000 a month.
- One company spent over $500M on Claude in a single month through unrestricted employee access, what one consultant called tokenmaxxing: maximized usage that spirals out of control with no oversight.
- A cap is a blind instrument: it cuts spend but cannot tell you which tokens produced value and which were wasted. It treats the symptom, not the cause.
- Arvind Jain, CEO of Glean, says it plainly: token spend is an architecture problem, not just a model problem. AI spend now looks like cloud spend, and there you win on token yield, not on a cap.
- AI Maestro from IQ Source defines the scope and expected return of each process before an agent is turned loose, so you never need a panic cap after the first surprise bill.
Imagine you give every employee a fuel card with no limit, and at month's end an impossible bill arrives. The obvious move is to cap the card. But the cap cannot tell you who drove to close deals and who drove in circles: it just cuts everyone the same. AI works the same way. The cap slows the bill, but it cannot separate the token that solved a problem from the token that was wasted. What actually fixes the spend is deciding, before you start, what the car is for.
AI-generated summary
Two things happened in the same week. Uber capped spending at $1,500 per person on each AI tool, after burning its full 2026 budget in four months. And it came out that one company spent over $500M on Claude in a single month, because it set no limits at all.
Half the internet reacted the same way: cap it. As an emergency brake, fair enough. As a strategy, it treats the symptom.
Here is the thesis in one line. The cap is not the cure. The disease is not the token price, it is turning agents loose on work nobody sized. No one defined how much return each process should produce per token spent, so when the bill lands, all that is left is to cut blindly. The fix is not to budget after the fact. It is to scope before you build.
The month that cost $500 million
The half-billion-dollar story is the scary one, so start there. Per an Axios report, one company burned more than $500M on Claude in thirty days. Not through a signed contract, but by giving employees unrestricted access: no budgets, no quotas, no monitoring. A consultant called it tokenmaxxing, the maximized usage that spirals once people generate prompts with no oversight.
Uber’s case is less dramatic but more instructive, because it is a disciplined company that still crashed. It burned its entire 2026 AI budget in the first four months. Before the cap, individual engineers were generating bills of $500 to $2,000 a month in token consumption. The answer was a $1,500 cap per person on each agentic coding tool, a usage dashboard for every employee, and a process to request more when needed.
It is a sound containment move. Uber did the right thing to stop the bleeding. But look at what a cap cannot do.
A cap can’t tell which tokens were working
A cap is a blind instrument. It cuts spend, but it cannot separate the token that closed a problem from the token wasted running in circles. It sets the same ceiling for the engineer who resolved three incidents and the one who left an agent looping all weekend.
Someone in this week’s conversation put it better than any consultant: nobody caps spend on something they can measure. The $1,500 cap is not a verdict on AI’s value. It is an admission that the company could not see which tokens were working.
And here is the part almost nobody wants to say out loud. The real problem is not the price per token, which has in fact been falling for years. The problem is that most companies turned agents loose on processes they never sized. They never asked, before starting, how much useful work this process should produce per dollar of tokens. Without that number, there is no way to know whether an $1,800 bill is a robbery or a bargain. All that is left is the reflex to cut.
Token spend now looks like cloud spend
Arvind Jain, CEO of Glean, keeps making a point worth hearing: token spend is an architecture problem, not just a model problem. His company reports clients whose annual AI budget runs out in one or two months, and his thesis is that the right question is not how many tokens a system consumes, but how much useful work it produces per token. He calls it token yield, and it depends on everything around the model: how context is retrieved, how models are routed, how prior work is reused.
It is the same transition the cloud went through fifteen years ago. Once agents can retry, loop, browse, and spawn sub-agents, spend stops looking like a fixed monthly license and starts looking like a cloud bill: variable, with no natural ceiling, and dangerous if nobody sets budgets and traces per workflow. The difference is that with the cloud we learned to measure cost per service before letting it run. With AI, too many companies skipped that step.
I have written before that cost was the guardrail you didn’t know you had, and that tokens per shipped feature is the KPI that matters. What changed this week is not the thesis. It is that the market finally reacted, and it reached for the wrong instrument. The cap comes after the spend. The scope comes before it.
What we do about it at IQ Source
When a company asks us to put AI into a process, we do not start with which model or how much budget. We start by asking how much useful work that process should produce, and what we would be willing to pay in tokens for that result. That number, defined before building, is what makes the panic cap unnecessary.
AI Maestro is the discovery where that math gets done. Two months mapping the real operation to decide which processes are worth automating and what token yield to expect from each. Out of it comes an AI Opportunity Score and a Go/No-Go gate that, more than once, recommends not turning the agent loose yet, precisely because the math does not close. That is not austerity. It is sizing the spend before you sign for it.
A cap tells you how much you can spend. It does not tell you whether it is worth spending. Those are two different questions, and the second one gets answered before you build, not when the bill lands. The next time someone proposes capping the team’s AI spend, ask the other question first: do we know, per process, how much useful work each dollar of tokens buys us? If the answer is no, the cap will not save you. It will just hide the problem for one more month.
Size your AI spend before you sign for itFrequently Asked Questions
Uber burned its entire 2026 AI budget in just four months because of token consumption from agentic coding tools like Claude Code. Before the cap, individual engineers were generating bills of $500 to $2,000 a month. The $1,500 per-person, per-tool cap is meant to contain the spend, though it does not address the underlying cause.
Tokenmaxxing is maximized AI usage that spirals out of control when employees generate prompts with no oversight or limits. Per an Axios report, one company spent over $500M on Claude in thirty days by granting unrestricted access with no budgets or monitoring. The problem was not the token price, it was the absence of scope and control.
A spending cap slows the bill, but it is a blind instrument: it cannot separate tokens that create value from tokens that are wasted, and it cuts everyone the same. It works as an emergency brake, not a strategy. Controlling AI costs requires defining the scope and expected return of each process before turning an agent loose.
AI Maestro from IQ Source defines, in a two-month discovery, which processes are worth automating and how much return each should produce per token spent. By measuring token yield before building, the company sizes the spend per process and never needs a panic cap after receiving an impossible bill.
Related Articles
Peak AI confidence, and the downslope nobody owns
Building AI has never been cheaper, so the bet is to build. But 95% of pilots move no P&L, and in most companies nobody owns the downslope of the curve.
Agent Autonomy Is a Liability, Not a Feature You Buy
Cognition raised $1B at a $26B valuation for an autonomous coding agent. In production, autonomy is the first thing that breaks. The real call is how much to give it.