Your AI bill comes from places you aren't looking
Ricardo Argüello — May 18, 2026
CEO & Founder
General summary
This morning The Register reported a Google Cloud customer waking up to $18,392 in charges against a budget set at $10 AUD, with nine safety features disabled by default. It's the third public AI-bill incident in 14 days. The pattern underneath isn't runaway agent loops. It's surface: dormant credentials the vendor re-activated, silent pricing changes, and context noise nobody is watching. The bill isn't coming from the model. It's coming from three places outside your field of view.
- On February 25 Joe Leon at Truffle Security disclosed that 2,863 live AIza keys gained Gemini access when Google enabled the API on existing projects. On April 7 CloudSEK named 32 keys embedded across 22 apps with 500M+ combined installs, including OYO, Google Pay for Business, and ELSA Speak. Documented victims include $82K in 48 hours against a three-person Mexican team (RatonVaquero on r/googlecloud, Feb 11), $67K in 19 hours against Colavo Ground in Korea (Junghyun Choi, Apr 28), and $12K in 24 hours against Ivan Iliev. Google's official response, verbatim: they've already implemented proactive measures to detect and block leaked keys. Individual reimbursements for cases The Register escalated, but no public blanket refund policy.
- Anthropic separated its Agent SDK on May 13. Max 20x subscribers moved to a separate $200/month credit pool at list API pricing, effective June 15. Community estimates put the effective price increase between 12x and 175x depending on workload. Claude Code v2.1.100+ shipped a silent tokenizer change inflating Opus 4.7 token counts by up to 35%, with broken prompt caching forcing full reprocessing on every turn.
- Sam McLeod measured GitHub's MCP server at 46,000 tokens across 91 tools. That's a quarter of Sonnet or Opus 4's context window consumed before a line of code is written. Jenny Ouyang traced a $1,600 Claude Code bill to MCP tools dumping full JSON into the context window on every call. Vantage measured agent sessions accumulating 25,000 to 35,000 tokens per request by turn 30. RTK, the Rust binary that compresses 88% of CLI output before it hits the LLM, exposes the problem more than it solves it.
- The right question isn't what the agent will cost. It's what surface area you're already leaking against. Two months of process discovery inside AI Maestro costs less than one $82,000 incident caused by a forgotten Firebase key. Yesterday's post on tokens-per-shipped-feature measured return per token; this one measures exposure surface. Same calculation from two sides.
- At IQ Source the first work in any enterprise AI adoption isn't model selection, isn't prompt engineering, isn't agent deployment. It's surface inventory: every active AIza key in every GCP project, 429 quota alerts at the same severity tier as 500s, token budgets per MCP tool rather than per session. The team that does this first doesn't pay the public learning curve the three incidents this week made everyone else pay.
Imagine your house has an electricity meter and a reasonable monthly bill. One day you open the meter box and find three outbound wires you don't remember: one feeding your neighbor's apartment from five years ago, one running to the garden shed you stopped using, and a third extending to the back room an old tenant wired in without permission. Your bill never spiked because nobody was drawing on those wires. Until someone did. The 2026 enterprise AI bill works the same way. The problem isn't the agent you can see. It's the surface attached to your account that you don't even know exists.
AI-generated summary
The Register reported this morning that a Google Cloud customer woke up to $18,392 in charges against a budget configured at $10 AUD. Nine safety features off by default. Zero notification. Direct quote from the customer: “no prior notification… not a lot of help to resolve the matter with any sense of urgency.”
That makes three public token-bill incidents in 14 days.
The trap is in how they’re being read. Most coverage frames the three as “the team lost control of its agents” or “AI is getting expensive.” Neither is the story. The pattern underneath all three is the same: the bill is coming from surfaces nobody is watching. What the CFO is actually measuring isn’t runaway usage. It’s exposure.
Yesterday’s post argued the CFO’s new KPI is tokens per shipped feature, not tokens per month. This post completes the other half of the same calculation. Tokens per shipped feature measures return. Exposure surface measures risk. They’re two columns of one budget, and most teams are looking at only one of them.
The pattern underneath the three incidents
Token economics isn’t a tuning problem. It’s a surface-area problem. Three public failure modes in the last 90 days make this hard to dispute.
One: credentials you forgot about. AIza keys created years ago by Firebase, embedded in Android apps, that the vendor re-activated without warning. Two: silent vendor changes. Pricing, tokenizers, and credit pools that shift between versions without notice to customers with a decade of history. Three: context noise. MCP servers consuming a quarter of the window before the first prompt, tools dumping full JSON into agent context on every call, sessions accumulating tens of thousands of tokens per turn that nobody is auditing.
You can’t tune your way out of any of this. You have to inventory the surface first. The distinction matters because most public conversation is on the wrong side: optimization before inventory is a false economy.
The credentials you forgot about
On February 25, 2026 Joe Leon at Truffle Security disclosed the technical fault that started this whole sequence. When Google enabled the Gemini API on existing Google Cloud projects, every AIza key already created in those projects, including Maps and Firebase keys that Google’s own documentation marked as safe to embed in client code, silently gained Gemini access. Truffle counted 2,863 live keys in its 90-day disclosure window.
On April 7 CloudSEK published the quantified Android version: 32 keys embedded across 22 popular apps with 500M+ combined installs. The list included OYO, Google Pay for Business, Taobao, ELSA Speak, and The Hindu. ELSA Speak had confirmed audio data exposure.
The documented bill incidents have real names and numbers:
- $82,000 in 48 hours. A three-person team in Mexico, RatonVaquero on r/googlecloud, February 11–12. The popular framing called this a “solo developer.” It wasn’t. The team-of-three correction matters because the gap isn’t an individual-habits problem; it’s an inherited-surface problem.
- $67,000 in 19 hours. Junghyun Choi, COO of Colavo Ground in Korea, April 28. A 2016 Firebase auto-provisioned key. 931 requests per second peak.
- $12,000 in 24 hours. Ivan Iliev on LinkedIn, May 15. A Firebase auto-generated Android key from 2016, dormant for years.
Google’s official response, via The Register on March 3, verbatim: “We have already implemented proactive measures to detect and block leaked API keys that attempt to access the Gemini API.” Google has refunded individual cases The Register escalated, but no public blanket refund policy. No public acknowledgment of the contradiction between Firebase documentation, which still says keys can be safely embedded in client code, and Gemini documentation, which says treat the key like a password. Same format, same company, both pages live.
The strongest counter-argument, made cleanly by Someone1234 on the Hacker News thread: “they’ve implemented hard-limits. So not offering hard-limits is a business decision, NOT a technical one.” That’s fair. The audit-your-keys advice is necessary, but the root cause is platform design, not developer hygiene. Both things are true at once.
I’ve been in computing for 36 years, since 1990, sitting in front of a Commodore 64 with 64KB you had to defend byte by byte. The pattern is the same one I watched play out with IAM audits in the cloud around 2012. Back then mid-size companies were finding out in month 18 that they had 47 active service accounts nobody could explain, each one carrying permissions inherited from a three-year-old experiment. The right answer was never “be more careful when you create new accounts.” It was inventory the surface and revoke anything without a current owner. The 2026 AI bill is asking for exactly the same work.
What the vendor changed this week without telling you
The second failure mode doesn’t require careless developers or forgotten keys. It requires being a paying customer in good faith.
On May 13 Anthropic announced the Agent SDK separation. Max 20x subscribers moved to a separate $200/month credit pool at list API pricing, effective June 15. Public community estimates put the effective price increase between 12x and 175x depending on workload. For a team that designed its flow assuming the Max plan covered the agent, next quarter’s bill multiplies without a single line of code changing.
Claude Code v2.1.100 introduced a silent tokenizer change. Public reporting measured up to 35% inflation in Opus 4.7 token counts for identical inputs. Prompt caching broke through the transition, forcing full reprocessing every turn. The interim workaround was downgrading to v2.1.34.
There’s a pattern with no name in governance code yet but everyone who has lived it knows it. Google Cloud customers with a decade of history and a top-tier billing plan get reset to Tier 1 ($250/month) during the AI Studio migration in April 2026. Features that depended on the higher plan stop working. Official docs say tier promotion requires “a few months of history” on accounts that already have 120 months of history. It’s absurd in the abstract and real in practice.
The operational conclusion is straightforward. Quota errors and 429 codes are no longer billing alerts. They’re first-order production signals at the same severity as 500s. Uber learned this in April; the rest of the market is learning it now.
The noise your agent is reading
The third mode is the most interesting because most teams don’t know it exists.
Sam McLeod published the baseline measurement: GitHub’s official MCP server registers 46,000 tokens across 91 tools. A quarter of Sonnet or Opus 4’s context window. Before the agent has seen the first line of your repository.
Jenny Ouyang traced a $1,600 Claude Code bill to MCP tools dumping full JSON into agent context on every call. Vantage published the aggregate analysis that says the same thing in numbers: agent sessions accumulate 25,000 to 35,000 tokens per request by turn 30. That’s not usage. That’s sediment.
Thomas Giordmaina published RTK on LinkedIn last week, a Rust binary that intercepts shell output before the model reads it. It compresses noise from eslint, grep, diff, playwright: 88.3% reduction measured across 6,077 commands, 50.4 million tokens saved. The interesting question isn’t whether the number is real. It’s what the number means.
It means 88% of what your agent is reading isn’t signal. It’s shell filler. The argument that matters isn’t “we should compress.” It’s “nobody knows what their agent is reading.” Compression hides the problem. Inventory names it. Issue 1282 on the RTK repo, opened by @panwudi, documents the counterexample: silent compression can corrupt sub-agent input with summary headers that read like data. Compressing before you know what you’re reading just changes the failure mode.
Anthropic answered this exact problem on May 6 with MCP Tool Search: on-demand tool discovery, reported overhead reduction of 85%. It’s the right shape. Treat the context window as inventory, not as infinite space.
Three actions for Monday and one three-month decision
What a CTO can move this week, in three concrete actions:
One. List every AIza key in every Google Cloud project and every linked account. The command is gcloud alpha services api-keys list --project=<PROJECT> per project. Any key without application, IP, or API target restrictions, sitting in a project where the Gemini API is enabled, is a $100K-per-day liability. Any key you can’t tie to a specific, currently shipping application gets revoked. Not “audited later.” Revoked.
Two. Move quota and 429 alerts to the same severity tier as 500 production errors. The Uber bill, the Google Tier 1 reset, the Anthropic Agent SDK separation: all three incidents had quota signals before the blowup. Nobody heard them because they were filtered as “billing alerts.”
Three. Measure tokens per MCP tool, not just per session. Each MCP tool is a fixed quota you pay at startup, not an on-demand cost. If your MCP server has 91 tools and you only use 7, the other 84 are eating your context window without doing any work.
All three are cheap and finishable before next Friday. The bigger decision, the one that takes three months, is ordering the inventory before deploying the agents. At IQ Source the first work of AI Maestro, the two-month discovery program that runs before any deployment, is exactly this: mapping the AI exposure surface a company already has, not proposing the next one. The map covers keys, quotas, active MCPs, sessions leaving token traces, vendor contracts with silent-change clauses. Two months of mapping costs less than a single $82K bill from an AIza key a six-year-old Firebase project left alive.
The CFO question isn’t what the next agent will cost. It’s what surface you’re already leaking against. Your AI bill comes from places you aren’t looking, and the only way to start looking is to inventory before you optimize.
I want an AI exposure map before next quarterFrequently Asked Questions
When Google enabled the Gemini API on existing Google Cloud projects, every AIza key already created in those projects, including Maps and Firebase keys that Google's own documentation marked as safe to embed in client code, silently gained Gemini access. Truffle Security disclosed the flaw on February 25, 2026. A key embedded in an Android app five years ago can rack up $80K in charges over 48 hours without any change on the owner's side.
Anthropic moved the Agent SDK into a separate credit pool. Max 20x subscribers were placed in a $200/month pool at list API pricing, effective June 15, 2026. For intensive agent workloads, community estimates put the effective price increase between 12x and 175x depending on usage pattern. Next quarter's bill can multiply even if the team changes nothing in the code.
GitHub's official MCP server registers 46,000 tokens across its 91 tools, a quarter of Sonnet or Opus 4's context window before a line of code is written. Sam McLeod published the measurement in August 2025. Every MCP tool loads its full schema into agent context at session start, whether the agent invokes the tool or not. It's permanent noise, not on-demand consumption.
Three steps. One: list every AIza key in every Google Cloud project with gcloud alpha services api-keys list and verify application and API restrictions on each. Two: configure 429 quota alerts at the same severity tier as 500 production errors. Three: measure tokens per MCP tool, not just per session, to surface passive loads consuming context window before the first prompt fires.
Related Articles
Tokens per shipped feature: the new KPI for AI budgets
Peter Steinberger spent $1.3M on tokens in 30 days. Riaz Khan replied on LinkedIn with the KPI that actually measures enterprise AI: tokens-per-shipped-feature.
AI doesn't cheapen your product, it changes your margin
OpenAI launched Deployment Co. Anthropic hit $45B ARR. Stripe embeds 1 AI engineer per 20 employees. Prices aren't falling. The delivery stack changed.