Skip to main content

Your AI bill comes from places you aren't looking

Three token-bill incidents in 14 days. The pattern isn't runaway usage. It's surface area: dormant credentials, silent vendor changes, context noise.

Your AI bill comes from places you aren't looking

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Business Strategy 9 min read

The Register reported this morning that a Google Cloud customer woke up to $18,392 in charges against a budget configured at $10 AUD. Nine safety features off by default. Zero notification. Direct quote from the customer: “no prior notification… not a lot of help to resolve the matter with any sense of urgency.”

That makes three public token-bill incidents in 14 days.

The trap is in how they’re being read. Most coverage frames the three as “the team lost control of its agents” or “AI is getting expensive.” Neither is the story. The pattern underneath all three is the same: the bill is coming from surfaces nobody is watching. What the CFO is actually measuring isn’t runaway usage. It’s exposure.

Yesterday’s post argued the CFO’s new KPI is tokens per shipped feature, not tokens per month. This post completes the other half of the same calculation. Tokens per shipped feature measures return. Exposure surface measures risk. They’re two columns of one budget, and most teams are looking at only one of them.

The pattern underneath the three incidents

Token economics isn’t a tuning problem. It’s a surface-area problem. Three public failure modes in the last 90 days make this hard to dispute.

One: credentials you forgot about. AIza keys created years ago by Firebase, embedded in Android apps, that the vendor re-activated without warning. Two: silent vendor changes. Pricing, tokenizers, and credit pools that shift between versions without notice to customers with a decade of history. Three: context noise. MCP servers consuming a quarter of the window before the first prompt, tools dumping full JSON into agent context on every call, sessions accumulating tens of thousands of tokens per turn that nobody is auditing.

You can’t tune your way out of any of this. You have to inventory the surface first. The distinction matters because most public conversation is on the wrong side: optimization before inventory is a false economy.

The credentials you forgot about

On February 25, 2026 Joe Leon at Truffle Security disclosed the technical fault that started this whole sequence. When Google enabled the Gemini API on existing Google Cloud projects, every AIza key already created in those projects, including Maps and Firebase keys that Google’s own documentation marked as safe to embed in client code, silently gained Gemini access. Truffle counted 2,863 live keys in its 90-day disclosure window.

On April 7 CloudSEK published the quantified Android version: 32 keys embedded across 22 popular apps with 500M+ combined installs. The list included OYO, Google Pay for Business, Taobao, ELSA Speak, and The Hindu. ELSA Speak had confirmed audio data exposure.

The documented bill incidents have real names and numbers:

  • $82,000 in 48 hours. A three-person team in Mexico, RatonVaquero on r/googlecloud, February 11–12. The popular framing called this a “solo developer.” It wasn’t. The team-of-three correction matters because the gap isn’t an individual-habits problem; it’s an inherited-surface problem.
  • $67,000 in 19 hours. Junghyun Choi, COO of Colavo Ground in Korea, April 28. A 2016 Firebase auto-provisioned key. 931 requests per second peak.
  • $12,000 in 24 hours. Ivan Iliev on LinkedIn, May 15. A Firebase auto-generated Android key from 2016, dormant for years.

Google’s official response, via The Register on March 3, verbatim: “We have already implemented proactive measures to detect and block leaked API keys that attempt to access the Gemini API.” Google has refunded individual cases The Register escalated, but no public blanket refund policy. No public acknowledgment of the contradiction between Firebase documentation, which still says keys can be safely embedded in client code, and Gemini documentation, which says treat the key like a password. Same format, same company, both pages live.

The strongest counter-argument, made cleanly by Someone1234 on the Hacker News thread: “they’ve implemented hard-limits. So not offering hard-limits is a business decision, NOT a technical one.” That’s fair. The audit-your-keys advice is necessary, but the root cause is platform design, not developer hygiene. Both things are true at once.

I’ve been in computing for 36 years, since 1990, sitting in front of a Commodore 64 with 64KB you had to defend byte by byte. The pattern is the same one I watched play out with IAM audits in the cloud around 2012. Back then mid-size companies were finding out in month 18 that they had 47 active service accounts nobody could explain, each one carrying permissions inherited from a three-year-old experiment. The right answer was never “be more careful when you create new accounts.” It was inventory the surface and revoke anything without a current owner. The 2026 AI bill is asking for exactly the same work.

What the vendor changed this week without telling you

The second failure mode doesn’t require careless developers or forgotten keys. It requires being a paying customer in good faith.

On May 13 Anthropic announced the Agent SDK separation. Max 20x subscribers moved to a separate $200/month credit pool at list API pricing, effective June 15. Public community estimates put the effective price increase between 12x and 175x depending on workload. For a team that designed its flow assuming the Max plan covered the agent, next quarter’s bill multiplies without a single line of code changing.

Claude Code v2.1.100 introduced a silent tokenizer change. Public reporting measured up to 35% inflation in Opus 4.7 token counts for identical inputs. Prompt caching broke through the transition, forcing full reprocessing every turn. The interim workaround was downgrading to v2.1.34.

There’s a pattern with no name in governance code yet but everyone who has lived it knows it. Google Cloud customers with a decade of history and a top-tier billing plan get reset to Tier 1 ($250/month) during the AI Studio migration in April 2026. Features that depended on the higher plan stop working. Official docs say tier promotion requires “a few months of history” on accounts that already have 120 months of history. It’s absurd in the abstract and real in practice.

The operational conclusion is straightforward. Quota errors and 429 codes are no longer billing alerts. They’re first-order production signals at the same severity as 500s. Uber learned this in April; the rest of the market is learning it now.

The noise your agent is reading

The third mode is the most interesting because most teams don’t know it exists.

Sam McLeod published the baseline measurement: GitHub’s official MCP server registers 46,000 tokens across 91 tools. A quarter of Sonnet or Opus 4’s context window. Before the agent has seen the first line of your repository.

Jenny Ouyang traced a $1,600 Claude Code bill to MCP tools dumping full JSON into agent context on every call. Vantage published the aggregate analysis that says the same thing in numbers: agent sessions accumulate 25,000 to 35,000 tokens per request by turn 30. That’s not usage. That’s sediment.

Thomas Giordmaina published RTK on LinkedIn last week, a Rust binary that intercepts shell output before the model reads it. It compresses noise from eslint, grep, diff, playwright: 88.3% reduction measured across 6,077 commands, 50.4 million tokens saved. The interesting question isn’t whether the number is real. It’s what the number means.

It means 88% of what your agent is reading isn’t signal. It’s shell filler. The argument that matters isn’t “we should compress.” It’s “nobody knows what their agent is reading.” Compression hides the problem. Inventory names it. Issue 1282 on the RTK repo, opened by @panwudi, documents the counterexample: silent compression can corrupt sub-agent input with summary headers that read like data. Compressing before you know what you’re reading just changes the failure mode.

Anthropic answered this exact problem on May 6 with MCP Tool Search: on-demand tool discovery, reported overhead reduction of 85%. It’s the right shape. Treat the context window as inventory, not as infinite space.

Three actions for Monday and one three-month decision

What a CTO can move this week, in three concrete actions:

One. List every AIza key in every Google Cloud project and every linked account. The command is gcloud alpha services api-keys list --project=<PROJECT> per project. Any key without application, IP, or API target restrictions, sitting in a project where the Gemini API is enabled, is a $100K-per-day liability. Any key you can’t tie to a specific, currently shipping application gets revoked. Not “audited later.” Revoked.

Two. Move quota and 429 alerts to the same severity tier as 500 production errors. The Uber bill, the Google Tier 1 reset, the Anthropic Agent SDK separation: all three incidents had quota signals before the blowup. Nobody heard them because they were filtered as “billing alerts.”

Three. Measure tokens per MCP tool, not just per session. Each MCP tool is a fixed quota you pay at startup, not an on-demand cost. If your MCP server has 91 tools and you only use 7, the other 84 are eating your context window without doing any work.

All three are cheap and finishable before next Friday. The bigger decision, the one that takes three months, is ordering the inventory before deploying the agents. At IQ Source the first work of AI Maestro, the two-month discovery program that runs before any deployment, is exactly this: mapping the AI exposure surface a company already has, not proposing the next one. The map covers keys, quotas, active MCPs, sessions leaving token traces, vendor contracts with silent-change clauses. Two months of mapping costs less than a single $82K bill from an AIza key a six-year-old Firebase project left alive.

The CFO question isn’t what the next agent will cost. It’s what surface you’re already leaking against. Your AI bill comes from places you aren’t looking, and the only way to start looking is to inventory before you optimize.

I want an AI exposure map before next quarter

Frequently Asked Questions

AI cost discipline Gemini API Truffle Security agent economics AI governance MCP AI Maestro

Related Articles

Tokens per shipped feature: the new KPI for AI budgets
Business Strategy
· 9 min read

Tokens per shipped feature: the new KPI for AI budgets

Peter Steinberger spent $1.3M on tokens in 30 days. Riaz Khan replied on LinkedIn with the KPI that actually measures enterprise AI: tokens-per-shipped-feature.

tokens AI KPI enterprise AI budget
AI doesn't cheapen your product, it changes your margin
Business Strategy
· 8 min read

AI doesn't cheapen your product, it changes your margin

OpenAI launched Deployment Co. Anthropic hit $45B ARR. Stripe embeds 1 AI engineer per 20 employees. Prices aren't falling. The delivery stack changed.

AI Maestro Technology Partner OpenAI Deployment Company