Skip to main content

Your Code Review Was Built for Humans. 41% of Code Isn't

41% of code shipped in 2025 was AI-generated, with a 1.7x higher defect rate. Your review process assumes the author understands the code. That's over.

Your Code Review Was Built for Humans. 41% of Code Isn't

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Software Development 8 min read

41% of code produced in 2025 was AI-generated or assisted. That’s not a projection — it’s the starting point. And that code carries 1.7x more defects per PR than human-written code, according to CodeRabbit’s analysis of 470 pull requests. Logic errors up 75%. Security vulnerabilities, 1.5x to 2x. Performance issues, nearly 8x.

There’s an additional data point that challenges the productivity narrative: METR’s controlled study found that experienced open-source developers were 19% slower using AI tools — while believing they were 20% faster. They rejected over 56% of suggestions.

As Aakash Gupta noted in a recent analysis, responding to Arvid Kahl’s observation that developers have always written bad code: yes, there was always bad code. But the quality infrastructure — code review, CI/CD, blame chains, ownership — was designed for a world where the author understood their own code. That world ended.

The problem isn’t that AI generates bad code. It’s that the entire review and governance system assumes something that’s no longer true: that whoever submits the code can explain it.

The System That Stopped Working

Software quality infrastructure was built on three pillars. All three depend on the same assumption, and all three are failing.

Code review assumes the author can explain their code

An effective code review is a conversation. The reviewer asks, “Why did you use this pattern here?” and the author explains their reasoning. If there’s a bug, the author knows where to look because they made the decisions that led to that point.

With AI-generated code, that conversation doesn’t exist. The PR submitter pasted output from Copilot, Cursor, or Claude. The reviewer sees code that nobody on the team wrote. There’s no reasoning to explain because there was no human reasoning — there was a statistical token prediction.

This connects directly to the organizational risks of vibe coding we’ve documented before: when an individual accepts code without understanding it, that’s a personal risk. When an entire team accepts code nobody understands through a formal review process, that’s systemic risk.

CI/CD tests for regressions, not correctness

Your CI/CD pipeline was designed to answer one question: “Does this change break something that already worked?” Tests verify regressions against known behavior. Linting verifies style. SAST looks for known vulnerability patterns.

None of them verify whether the new code is correct. None of them ask: “Does this logic do what it should in cases the author didn’t consider?” With human code, that gap was manageable because the author covered the edge cases they knew about. With AI-generated code, the uncovered edge cases are the ones the tool didn’t predict — and nobody else knows about them either.

“Tests pass” always meant “we didn’t break what was there.” It never meant “the new code is correct.” The difference is that now that gap is much wider.

git blame shows who made the commit. In the traditional model, that equaled who wrote and understood the code. That equivalence is over.

When an auditor asks, “Who reviewed this change and why was it approved?”, the honest answer is often: “Someone accepted the AI’s suggestion and someone else approved the PR without fully understanding the code.” That’s not an individual failure — it’s a process that no longer produces the outcome it’s supposed to produce.

Accountability, incident response, compliance audits — everything depends on a traceability chain that now has a gap. You don’t know who understood the code, only who accepted it.

What We See in Companies Already Facing This

In our experience at IQ Source working with development teams, these patterns show up consistently:

PRs that get approved because nobody wants to admit they don’t understand the code. The reviewer sees code that looks reasonable, tests pass, and approving is easier than asking, “Can you walk me through this logic?” when both know Copilot generated it. The social incentive pushes toward approval, not comprehension.

Test coverage that looks healthy but tests the wrong things. AI generates code and also generates tests. Those tests validate what the AI predicted the code would do — not what the system needs it to do. 85% coverage that doesn’t cover the business scenarios that matter.

Incident response times increasing because on-call didn’t write the code and the “author” can’t explain it. This connects to what we’ve analyzed about the AI fluency gap in teams: the difference between those who integrate AI tools with judgment and those who use them as a substitute for understanding amplifies under pressure.

Invisible technical debt. AI-generated code tends to solve the immediate problem in ways that look correct but introduce unnecessary dependencies, abstractions that don’t align with the existing architecture, or duplicated patterns because the tool lacked context about the rest of the codebase. That debt doesn’t show up in any linter.

What Needs to Change

Code review adapted for AI-generation

Code review needs to shift from “Does this code look correct?” to “Can the author demonstrate they understand this code?”

Three concrete changes:

  • “Why” annotations: Every PR should include not just what the code does, but why that approach was chosen over alternatives. If the answer is “because that’s what the AI suggested,” that’s valuable information the reviewer needs to know.
  • Secondary review for high AI-generation PRs: When a PR has a high percentage of AI-generated code, it should go through a second reviewer focused specifically on comprehension, not just functionality.
  • Mandatory comprehension questions: The reviewer should ask at least one question about edge-case handling. Not as a formality — as a real filter. If the author can’t answer without consulting the AI tool, that signals an understanding gap that needs resolving before merge.

Quality gates for the AI-generation era

Traditional gates are still necessary, but no longer sufficient. Security scanning is necessary but not sufficient — you need gates that validate what conventional tests don’t cover.

Traditional gateAI-adapted gate
Unit tests (does it pass?)Behavioral testing (does it work in real business scenarios?)
Code coverage (how much was tested?)Mutation testing (do tests catch real bugs?)
Linting (does it follow style?)Architectural consistency (does it fit the system design?)
SAST (known vulnerabilities?)Dependency analysis (does it introduce unnecessary patterns or abstractions?)

Mutation testing in particular: if you alter a line of code and all tests still pass, those tests aren’t validating anything. With AI-generated code, where tests and code come from the same source, this validation is critical.

Explicit per-module ownership model

Every module, service, or component needs a human owner who can answer three questions:

  1. What does this code do and why does it do it this way?
  2. What happens if it fails?
  3. How would you change it if requirements shift?

If the owner can’t answer without consulting the AI tool, they’re not really the owner — they’re a middleman. And middlemen can’t debug at 3 AM when the service goes down.

This doesn’t mean teams can’t use AI to generate code. It means someone has to move from “I accepted the suggestion” to “I understand this code enough to maintain it, fix it, and evolve it.”

The Cost of Not Adapting Your Process

The classic figure from the IBM Systems Sciences Institute still applies: fixing a defect in production costs ~30x more than fixing it during development. When your AI-generated code carries 1.7x more defects and your review process isn’t calibrated to catch them, you’re multiplying risk by cost.

But the bigger cost isn’t technical. It’s operational:

  • Degraded incident response: When on-call has to understand code that nobody on the team actually wrote, MTTR goes up. Extra minutes in production mean customer impact, SLA breaches, and revenue loss.
  • Compliance risk: When a SOC 2 or ISO 27001 auditor asks, “Who reviewed this code and how was its security validated?”, and your answer depends on a review process that wasn’t designed to verify comprehension, you have a compliance gap.
  • Silent attrition: The best developers know when a quality process has stopped working. If your senior team is getting frustrated with PRs being approved without real understanding, you lose them.

How to Know If Your Organization Already Has This Gap

If more than 30% of your new code comes from AI tools and your review process hasn’t changed in the past two years, you have a governance gap. It’s not hypothetical — it’s arithmetic.

The signals: PRs approved in under 15 minutes for changes exceeding 200 lines. Test coverage going up but production bugs not going down. Incident resolution times increasing quarter over quarter. Diffuse ownership where nobody can explain entire modules without reaching for the AI.

At IQ Source we audit development processes — not just the code, but how it gets reviewed, tested, and deployed. If your team is generating code with AI but evaluating it with 2020 processes, the conversation you need isn’t about which AI model to use. It’s about whether your quality infrastructure still works.

Let’s talk about your development process →

Frequently Asked Questions

code quality code review AI-generated code software governance CI/CD development processes technical debt

Related Articles

What Your AI Won't Ask (and Your Startup Will Pay)
Software Development
· 6 min read

What Your AI Won't Ask (and Your Startup Will Pay)

A founder lost $87,500 because his AI generated working code without questioning security. AI tools answer what you ask, not what's missing.

vibe coding software security startups
WebMCP: Your Website Talks to AI Agents Now
Software Development
· 7 min read

WebMCP: Your Website Talks to AI Agents Now

WebMCP is the W3C protocol that lets AI agents use your site's features directly — no scraping, no screenshots. Here's how it works and why it matters.

WebMCP AI agents web protocol