Code is not cheap: AI productivity is a codebase property
Ricardo Argüello — April 27, 2026
CEO & Founder
General summary
Linas Beliūnas reported on April 21 that Google has assembled a strike team led by Sergey Brin to close the gap with Anthropic on coding. The number that triggered the move: Google writes around 50% of its code with AI, Anthropic close to 100%. Same API, same Claude. The difference is not the model. It's the codebase. Matt Pocock tested specs-to-code in his Claude Code for Real Engineers course and reported that each iteration produced worse code. Pawel Huryn cut his Claude Code bill from $750 to $100/month with the same workflow, just by cleaning up the harness. AI productivity is a codebase property, and that is exactly what the IQ Source Tech Partner service delivers.
- Linas Beliūnas (1.7K reactions, 2026-04-21) reported that Google formed a strike team led by Sergey Brin to catch up to Anthropic, which writes ~100% of its internal code with AI versus Google's ~50%
- Matt Pocock, instructor of Claude Code for Real Engineers, gave a talk this week where he tested specs-to-code and reported each iteration produced worse code; the way out was Ousterhout, DDD, and Kent Beck
- Pawel Huryn (130K subscribers, 261K views) cut his Claude Code bill from $750/mo to $100/mo with no model swap and no workflow change, just by cleaning cache hit rate, context bloat, model routing, and input format
- Guillermo Rauch and Lee Robinson closed the picture on April 26: the coding agent is the foundation of all superintelligence, which makes the codebase the foundation of the foundation
- I have been in this 36 years and watched the same pattern five times. Cheap CPUs, cheap bandwidth, cheap cloud, cheap SaaS, now cheap tokens. Each time the layer that gets cheap is not the problem; the problem is what sits on top of cheap
Picture two restaurants that buy the same industrial stove at the same price. One has a kitchen designed by a chef with twenty years of trade, with the mise en place organized, the flows clean, the team trained on shared vocabulary. The other plugged the stove in and left the kitchen the way it was. The stove is the same. The first one ships dinner in five minutes per plate. The second one burns it. The expensive stove that just turned commodity this week is Claude Opus 4.7. The first kitchen is Anthropic's codebase. The second kitchen is most software teams' codebase, and the invoice is starting to show the difference.
AI-generated summary
On Tuesday, April 21, Linas Beliūnas posted on LinkedIn that Google has assembled a strike team led by Sergey Brin to close the gap with Anthropic on coding. The number that triggered the move is what made boards uncomfortable this week: at Google around 50% of new code is written by AI. At Anthropic, close to 100%. Brin stepped in personally because the executive committee had no other explanation for the gap.
The headline reads as an Anthropic win. But it opens a more interesting question. If the model is the same Claude Opus 4.7 that anyone can buy through the API, why can Anthropic run at 100% while Google needs a strike team to reach 50%?
The answer is not that Anthropic has a secret Claude. It is that Anthropic has the right codebase for Claude to run at 100%. Most software teams do not.
The inconvenient witness from the side that teaches
Matt Pocock teaches a course called Claude Code for Real Engineers. He makes a living selling AI to developers. Last week he gave a conference talk where he said something you do not hear in vendor keynotes. He tried the specs-to-code workflow that has been everywhere on X: write a markdown specification, let the AI generate code, never look at the code, only edit the spec when something breaks. His report: each iteration produced worse code than the previous one. More bugs, less coherence, more drift. The AI compiler never converged.
The line he delivered on stage: “Code is not cheap. In fact, bad code is the most expensive it’s ever been.” The reasoning, the kind any CTO with five years of trade should keep on the wall: when AI can ship ten times more code in the same time, a codebase that is hard to change absorbs that speed and converts it into garbage. Speed without architecture is compounding debt.
Pocock’s way out was not a more expensive model. It was three books that have been on the shelf for twenty years. A Philosophy of Software Design by John Ousterhout, which defines complexity as anything in the structure of a system that makes it hard to understand or modify. Domain-Driven Design by Eric Evans, which introduces the ubiquitous language shared by developers, domain experts, and now agents. The Pragmatic Programmer by Hunt and Thomas, with its chapter on software entropy: every change made in isolation, without thinking about the system, degrades the codebase a little more.
What Pocock describes is what every software engineering book of the last twenty years has called good practice. The difference is that before AI, those practices were preference. On a 15-engineer team, a poorly structured codebase cost planning meetings, long onboarding, and Friday-afternoon commits that broke production. The team still shipped. With a poorly structured codebase and an AI writing 80% of the code, the team stops shipping.
The bill from the side that measures
Pawel Huryn is a product manager with 130K subscribers on his product newsletter. On Saturday, April 25, he posted a piece that hit 261K views. The title: “Claude Code’s Limits Are Generous. The Problem Is Your Harness.”
The data point that opens the post is the one CFOs care about. A month ago Huryn was paying $750/month for Claude Code Max 20x plus extra usage. Today he pays $100. Same workflow. Same model. Same prompts. The difference: he cleaned up the harness.
The four levers Huryn names are specific and measurable.
The first one is cache discipline. Every time you change tools or model mid-session, Anthropic’s cache prefix invalidates, and the next turn gets billed as new tokens instead of as cache reads, which cost a tenth. A healthy hit rate sits at 90% on the five-minute cache, and the way you get there is by locking tools and model at session start.
Context budget is the second one. If you let the session grow to the 1M window, every turn pays for 1M tokens, even if you only need 50K of them. Compacting at 50%, separating unrelated work with /clear, and routing mechanical bulk to subagents that do not need the parent’s reasoning all cut the line.
The third lever is model routing. Opus by default costs 5x more than Sonnet and 25x more than Haiku, and most teams pay for Opus on tasks Sonnet would handle just fine. Route every subtask to the cheapest model that can do it well. Mechanical work goes to Haiku, scoped exploration to Sonnet, real planning with tradeoffs to Opus.
Input format is the fourth, and the easiest to overlook. PDFs read as images cost five to ten times what the same content costs as pdftotext. Browsing with the accessibility tree instead of screenshots cuts ~90% of the tokens on web research. These two changes alone collapse the token count on document and web tasks.
None of those four levers required a new model. None required paying more. What they required was treating the harness as a product, with its own measurement and improvement discipline. Which, again, is what a good engineer would call good practice.
The validation that closes the picture
On Sunday, April 26, Guillermo Rauch, CEO of Vercel, wrote that coding agents will be the foundation of all superintelligence. The reasoning: coding ability is indistinguishable from “proficiency with computers”. An agent that masters bash, filesystems, configuration, and program installation can also examine its own source, its state, its instructions, and propose changes to itself. Lee Robinson, also from the Vercel orbit, added the observation that was not obvious a year ago: it was not obvious that an excellent coding agent would also be the path to a general agent for all knowledge work. If the coding agent is the foundation, the codebase that agent runs on is the foundation of the foundation. That layer is not commoditizing soon, because it is the only one that captures the unique context of your business.
The pattern I have been watching since 1990
I have been in this 36 years. I started in 1990, at 15, programming on a Commodore 64 and a Texas Instruments. Since then I have watched the same pattern at least five times: an infrastructure layer becomes cheap, everyone assumes that means it no longer matters, and five years later they discover the layer that sits on top of cheap is what decides who wins.
Five quick reminders before we move on. Around 1995, when Intel made processors affordable to most companies, the people who had not learned Big O or algorithm design discovered fast that cheap CPUs do not compensate for badly written code. Fiber and DSL arrived in the early 2000s at prices that would have looked like science fiction a decade earlier, and frontends that did not optimize payloads kept feeling slow no matter how much bandwidth was available. AWS rewrote the cost of compute starting in 2010, and many teams found out that cheap cloud plus a broken architecture equaled a six-figure invoice before the first quarter closed, not scalability. The SaaS wave of the second half of the decade compressed enterprise software from six-figure licenses to three-figure subscriptions, and badly designed workflows ended up paying for subscriptions that never moved the P&L.
Now the fifth cycle is here. AI tokens are dropping toward the $0.08-per-runtime-hour that Anthropic just set as the reference price, and tier-two models like GLM-5.1 deliver Opus-class performance at a tenth of the cost. The temptation is to repeat the mistake of the previous four cycles. Mistaking the layer that just got cheap for a layer that no longer matters.
What gets cheap each cycle is not the problem. The problem is what sits on top. And what sits on top this time is the codebase. That is the moat of the next five years, and it is exactly the piece Anthropic already has and most teams do not.
The distinction that matters
There is a real difference between the codebase that gets redesigned every year inside a SaaS product and the codebase that is the product. For the first one, an imperfect architecture is technical debt that gets paid eventually; the team still bills subscriptions. For the second one, the codebase is the balance sheet. Every architectural decision is either an asset or a liability, depending on whether it rewards or punishes the agent that will live there for the next five years.
That second group is where this change lands hardest. Software companies whose product lives in the codebase from day one. Industrial automation. AR EdTech. Vertical fintech platforms in LatAm. Medical software. Specialized ERPs. Each one has a domain its team understands better than anyone else, and that is not worth re-learning. But most do not have anyone on the team who has spent the last six months thinking about how to design a codebase so an agent can maintain it without the per-turn cost defaulting to Opus.
This is exactly where last week’s fugazi conversation lands in operation. The one-shot is a demo. The real codebase, where paying customers run real transactions, is still work.
What we do at IQ Source about this
Tech Partner exists specifically for software companies whose product lives in the codebase from day one. It is not staff augmentation. It is not an agency that will put hands on anything. It is a partner that joins the client’s team with a specific thesis: that the codebase, not the model, is the moat for the next five years, and that designing it so it rewards agents instead of punishing them takes trade you buy by the hour.
What we measure in the first map is not complicated, but it takes discipline. Are the modules deep or shallow? A deep module, as Ousterhout described it, has a lot of functionality behind a simple interface. A shallow module has a little functionality behind a complicated interface. Shallow codebases are the ones that drive AI crazy, because to understand something simple it has to walk through twenty different functions. Is there a shared vocabulary between the team and the agents, or does each function call the same thing by three different names? Is the test boundary at the module interface, or scattered across internal details? Does anyone on the team own the harness, measure the cache hit rate, know when to invalidate the prefix, and show the cost-per-feature curve to leadership?
None of those four questions need a new model. All four need someone who has lived in real codebases with real agents long enough to have made the mistakes and corrected the pattern. That is what Tech Partner sells: time already invested, mistakes already made, pattern already stable.
AI Maestro, the other line, applies when the question is not “how do I design my codebase for agents” but “where in my operation does AI fit and where does it not.” The two lines complement each other but answer different questions and get hired at different points in the company’s cycle.
Five questions before signing the next AI productivity check
If the executive committee is about to approve the next AI productivity invoice, there is a short test worth running before the signature.
Start with the most concrete question: does the codebase reward agents or punish them? A week of measurement reveals it. If the AI ships code that gets worse each iteration (the pattern Pocock reported on stage), the codebase is punishing. If generated code integrates cleanly, it is rewarding.
Move to the language. Is the shared vocabulary between the team and the agents captured anywhere? And here the wiki nobody updates does not count. What counts is a living document with domain terms, canonical entity names, and the invariants that must not break. Without one, the AI invents three different versions of the same concept across three different files, and nobody notices until the bug reaches production.
The third checks the tests. Ideally they live at the module interface, not scattered across internal details. When tests sit at the interface, the agent can rewrite everything inside without breaking the contract. When they are scattered, any refactor breaks twenty tests for the wrong reason.
Four and five travel together. Does anyone measure the cache hit rate? Does anyone own the harness, with a real name attached? If the answer to the first one is no, you are paying 10x per turn. If the answer to the second is “the whole team” or “the AI lead”, the operational answer is nobody, and the harness will degrade at the pace of the next Opus.
If your team cannot answer the five with clarity, the next useful conversation is two hours long. We map one of your repos, answer the ones we can answer today, and write down what needs intervention. No quote attached. The address is the usual one: info@iqsource.ai.
Brin stepped in personally. Pocock stopped talking to the model and went back to reading Ousterhout. Huryn cleaned the harness and shrank the bill seven times. The three stories sound different and end in the same place. Code is not cheap. The team that designs the codebase for the next five years wins the next five years. The team that assumes the model will do the work pays the bill of the team that designed.
Frequently Asked Questions
The difference is not the model. Anthropic, Google, and any other company can buy the same Claude Opus 4.7 through the API. What changes is the codebase. Anthropic has redesigned its internal architecture around deep modules with simple interfaces, a shared vocabulary between humans and agents, tests at the module boundary, and harness discipline. Those four properties are what let the same model deliver at 100%.
Matt Pocock, instructor of Claude Code for Real Engineers, gave a talk in late April where he reported testing the specs-to-code workflow (write a markdown spec, let the AI generate code, never look at the code, only edit the spec) and each iteration produced worse code. His line: code is not cheap, bad code is the most expensive it's ever been. The fix was not a better model; it was returning to classic software fundamentals.
Pawel Huryn, a PM with 130K subscribers, posted on April 25, 2026 that he went from $750/mo to $100/mo with the same workflow and the same model by cleaning four harness levers. Cache hit rate (don't change tools or model mid-session). Context bloat (compact at 50%, route to subagents). Model routing (Sonnet or Haiku for mechanical work instead of defaulting to Opus). Input format (pdftotext instead of reading PDFs as images).
Four measurable properties. Deep modules with simple interfaces (Ousterhout), where most functionality lives behind few entry points. A shared vocabulary (DDD's ubiquitous language) that defines domain terms consistently across humans and agents. Tests located at the module boundary, not scattered across internal details. Harness discipline with a named owner who measures cache hit rate and cost per shipped feature.
Related Articles
Nine seconds: the agent confessed, but the failure wasn't its own
Cursor + Claude Opus 4.6 wiped PocketOS production data in 9 seconds. The AI confessed. But the real failure was three architectural sins, not the model.
The enablement layer IS the runtime: Miller's framing breaks
Adam Miller says Goose is neutral plumbing and the enablement layer is where engineering belongs. Hannah Stulberg's DoorDash Team OS proves exactly the opposite.