Anthropic measured its own AI. Can you prove yours?

Anthropic says Claude writes most of its merged code and its engineers ship 8x more per quarter. The new line isn't who uses AI, but who can prove output.

www.iqsource.ai

Anthropic measured its own AI. Can you prove yours?

Ricardo Argüello

Anthropic measured its own AI. Can you prove yours?

Ricardo Argüello — June 8, 2026

Ricardo Argüello

CEO & Founder

June 8, 2026 Business Strategy 6 min read

General summary

Anthropic published numbers about itself this week: Claude writes most of the code merged inside the company, north of 80% by this week's coverage, engineers ship around 8x more code per quarter than they did in 2021-2025, and the company frames it as a possible path to recursive self-improvement. Strip the science fiction and the important part stays: a company put a falsifiable output number on the table about itself. Meanwhile most of the industry is still flexing inputs: a customer burning 100 billion tokens a month, Copilot seat counts, $30M rounds with a Cursor license as the entire AI strategy. The new dividing line isn't who uses AI. It's who can prove it produced anything.

AI-generated summary

Explore other styles:

This week Anthropic did something unusual for an AI company. It published numbers about itself.

Claude now writes the majority of the code merged to production inside the company, north of 80% by this week’s coverage. And its engineers, on average, ship roughly 8x more code per quarter than they did across 2021-2025. Anthropic frames this as a possible path to recursive self-improvement, AI accelerating the development of the next AI, and says it is happening faster than they expected.

The headline ran off with the science fiction, as headlines do. Some celebrated the apocalypse, others replied that Anthropic’s own staff must be depressed. And the company itself hit the brakes inside its own text: achieving recursive improvement alone, it wrote, does not by itself imply an immediate change in how industrial production or society is organized.

Strip all of that away and what’s left is the part that matters for your business. The radical thing isn’t the robot that improves itself. It’s that a company put an output number about itself on the table, one you can falsify, instead of flexing how much it consumes. That’s the thesis here: the new dividing line isn’t who uses AI. It’s who can prove it produced anything.

The number, not the robot, is the news

Set aside, for a second, whether AI will improve itself. That’s a dinner-table debate, and nobody on your board is going to act on it Monday morning.

The actionable thing is different. Anthropic said “8x more code per quarter” and “over 80% of merged code.” Those are output numbers. You can argue with them, audit them, even debunk them. Someone can ask “merged code or code that reached the customer?”, “8x measured how?”, and the question lands precisely because there is a number to argue against.

That’s the part almost nobody copied from Anthropic, and it’s the only part worth copying. Not the recursive self-improvement. The willingness to say a production number out loud, knowing someone will check it.

Because most of the companies that “adopted AI” this year don’t have a number like that. They have a feeling. They have an invoice. They have a dashboard full of tokens. What they don’t have is a single figure that proves AI got something to a customer faster or cheaper than last year.

Everyone else is flexing inputs

Mark Ajzenstadt, who runs a services company that embeds AI engineers inside product teams, put his finger on it this same week. His list is worth reading, because it’s the exact mirror image of what Anthropic did.

OpenAI’s CEO on stage flexing that a customer burns 100 billion tokens a month, with no mention of what it produced. Consulting firms billing millions for AI strategies written by people who never shipped a production agent. CTOs reporting “AI adoption” to their boards by counting Copilot seats, while nobody tracks what reaches production. Startups raising $30M rounds with “AI-native” in the deck and a Cursor license as the entire AI strategy.

Every item on that list is an input metric. Tokens consumed, seats bought, money spent, rounds raised. None of them says a word about what came out the other end.

The line Mark closes his thread with is the one that stuck with me: “I know our cost per merged PR.” One sentence, and it exposes the whole list above it. He doesn’t flex how many tokens he burns. He knows how many pull requests he closes for that spend. That’s the difference between knowing what you pay and knowing what you produce, and almost nobody on the other side of that list knows it.

Measuring output is hard. That’s why almost nobody does it.

There’s an honest reason so many people stop at input metrics: they’re easy. Counting Copilot seats is a spreadsheet. Adding up the token bill is something the vendor does for you. Neither one requires you to define what “done” means.

Measuring output does. To say “we produced 8x more” you need two things most teams don’t have: a clear definition of what counts as “reached the customer,” and an honest baseline from last year to compare against. Without those two, there’s no number, there’s an anecdote.

And here’s the trap that makes it slipperier: tests passing doesn’t mean something was worth doing. I wrote about that when Opus 4.8 shipped, about how a thousand agents can finish the wrong task with the whole test suite green. A green dashboard is an input metric wearing the costume of a result. It tells you the system ran, not that it produced something a customer needed.

That’s why neither panic nor euphoria helps. Both are ways of dodging the boring question. The boring question is: can you state, today, one honest number for what your AI got to a customer? If the answer is “let me check the token dashboard,” the answer is no.

What we do about it at IQ Source

When a company asks us to accelerate with AI, the first thing we ask for isn’t access to their tools. It’s their baseline. How much did you produce before AI, measured in something the business actually cares about? If it doesn’t exist, that’s the first job, before anyone touches the accelerator. Because accelerating with no baseline leaves you exactly where half the industry is: spending more, unable to prove anything changed.

AI Maestro is the discovery where that baseline gets built. Two months mapping the real processes of your operation, not the org-chart version, scoring each with an AI Opportunity Score, and ending in a Go/No-Go gate process by process. And the gate is decided on outcomes that reach the customer, not on seats bought. I covered the concrete metric we install separately, so this isn’t theory: cost per shipped feature, not tokens per month. That’s the figure that separates the team scaling with margin from the team burning with dignity.

Anthropic measured itself this week and published the number. You don’t have to believe in recursive self-improvement to take the lesson. The lesson is simpler and more uncomfortable: next time someone at your company celebrates that AI now writes half the code, or that the marketing team uses five new tools, ask one thing before you clap. Show me the output number. If only the invoice shows up, you didn’t prove anything. You just spent with style.

Build the baseline that proves what AI produces

Frequently Asked Questions

Anthropic Claude AI metrics engineering productivity AI governance AI Maestro AI ROI

Related Articles

Your Marketing Team Doesn't Need a Trained AI Model

Business Strategy

July 22, 2026 · 7 min read

Your Marketing Team Doesn't Need a Trained AI Model

Nadella says every company should train its own AI model. Levie and Zhang say that's harder than it looks. What marketing needs to protect is its criteria.

Satya Nadella Aaron Levie Jesse Zhang

Cisco Just Gave 90,000 Employees a Personal AI Agent

Business Strategy

July 21, 2026 · 7 min read

Cisco Just Gave 90,000 Employees a Personal AI Agent

Cisco's CFO confirms every employee gets a cost-routed AI agent by fiscal year start. On-prem infrastructure, smart routing, and an unresolved trust question.

Cisco AI agents AI governance