Skip to main content

Your AI Bottleneck Isn't the Model. It's the Context.

Aakash Gupta's rule after 1,500 hours in Claude Code: CLAUDE.md should be almost empty. Enterprise AI pilots need the same context discipline.

Your AI Bottleneck Isn't the Model. It's the Context.

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Business Strategy 9 min read

Aakash Gupta spent 1,500 hours building inside Claude Code before he posted something that runs against the instinct of almost every team currently experimenting with enterprise AI:

“The reason most teams get mediocre outputs from Claude Code has nothing to do with the model. It has everything to do with how much room they left it to think. CLAUDE.md files should be almost empty.”

Instinct says the opposite. Give the model more information and it answers better. Give it the full documentation, the full handbook, the full runbooks and policies, and it now has everything it needs. A bigger context window equals a smarter model.

None of that is true. And most of the enterprise AI pilots stalling right now are stalling for exactly this reason, not because the model is wrong.

The math of lossy compression

A 1 million token context window sounds infinite. Anthropic shipped Claude Sonnet with 1 million token support, and Opus 4.6 runs in that extended window too. Roughly seven novels of text, all available in a single call.

Sounds like the problem is solved. It isn’t.

A real company’s documents fill that window quickly. Run the exercise on your own stack. This quarter’s PRDs. The onboarding handbook. The operations runbooks. The compliance policies. The last six months of sales notes. Open support tickets. The architecture documented in Confluence. Active vendor contracts. Seven novels evaporate without effort.

And here’s the part almost nobody audits: when the context window fills past a certain point, the model starts performing worse. Not because “it won’t fit.” Because attention inside the window isn’t distributed evenly.

The paper most CTOs haven’t read

In 2023, Liu et al. published “Lost in the Middle: How Language Models Use Long Contexts” (Stanford, UC Berkeley, Samaya AI, with Percy Liang among the authors). The paper measured something concrete: how does an LLM’s ability to answer questions change depending on where the relevant information sits inside the context?

The finding is uncomfortable. Models are significantly better at using information placed at the beginning or the end of the context than information buried in the middle. The accuracy curve is U-shaped. Stuff 500 pages of documentation into the prompt and critical information in the middle gets functionally lost, even though it technically lives inside the window.

The literature has only grown since. The phenomenon has picked up several names — “lost in the middle,” “context rot,” “long-context attention decay” — but they all describe the same thing. Fitting is not reasoning.

And this is the design error that explains why your AI pilot hit a wall. The team loaded the whole repo into the prompt, loaded the whole wiki, loaded the whole knowledge base, and assumed that with enough context the model would find the answer. But the context they loaded is exactly what degraded the model.

Bigger windows aren’t saving you

The industry has spent two years solving the wrong problem. Every new model release brags about the context window as the headline metric: 128K, 200K, 1M. And every internal product team reads those announcements as if they were the fix for their stalled pilots. “Now it’ll work. Now it all fits.”

It won’t. “Fit” and “reasoning” are two different things the industry keeps collapsing into one.

This is the same pattern I wrote about in the enterprise AI activation gap. Sixty percent of organizations have access to AI tools, but only thirty-four percent actually transform anything with them. The difference wasn’t the model. It was everything around the model.

Same read here. Your problem probably isn’t Claude, GPT or Gemini. Your problem is what you’re feeding them, and in what order.

Progressive disclosure: the pattern was already invented

The fix isn’t new. Engineers who have been building systems for thirty years already solved this several times, in different domains.

Databases don’t load entire tables into memory every time you run a query. They use indexes, resolve paths, and load only the rows you need. If every SELECT read the whole disk, Oracle would have died in the nineties.

Filesystems don’t expose the content of the whole disk when you open a folder. They use hierarchies. You walk to the directory you care about and load that listing only.

Modern web applications don’t ship all of the HTML, JavaScript and CSS for the entire site when you hit the landing page. They lazy load. Routes load when you visit them.

The pattern has had a name in interaction design since the eighties: progressive disclosure. Show the minimum the decision-maker needs, and let them go deeper on demand.

Applied to AI it’s direct. A short root file that always loads, with pointers to where each thing lives. Nested files per folder acting as topic indexes. The model reads the index, walks to the exact file it needs for the query, and loads only that. No tokens wasted on irrelevant documentation. No explore agents scanning the whole repo blind. The model’s reasoning room stays intact.

Aakash tested it literally with lean CLAUDE.md files and nested index files by folder. I’ve watched the same thing work at clients where the pain was “the model is hallucinating last quarter’s numbers” and the actual cause was that the team had been pasting six months of reports into the context and hoping the model would know which one was the latest.

From PM trick to enterprise Team OS

And this is where the topic stops being a prompt tweak and turns into organizational architecture.

At the level of a solo PM working with Claude Code, “progressive disclosure” looks like a ten-line CLAUDE.md and five index files. It’s personal. It fits in one person’s head.

At the level of a fifty-person company where half the org is touching AI tools, the same principle looks completely different. It looks like a governed retrieval layer. Who owns each index. Which documents are authoritative vs. archival. Who approves changes to the root index. How it’s versioned. What confidence level each source carries. It’s information infrastructure. It isn’t a prompt engineering config.

Aakash calls it Team OS. I think the name is right. It captures that this isn’t a tool or a workflow. It’s an operating system for how the organization delivers context to its AI models.

Without that Team OS, every team improvises. One analyst pastes whole reports into the chat because no index exists. One engineer dumps the entire repo into the agent because there’s no convention. One ops director keeps a personal doc folder because the wiki is out of date. All of it degrades the model at the exact moment you’re using it for decisions that matter.

The discipline connects to everything else

This is the same argument I’ve been making for months from different angles.

When I wrote about governing AI agents with Kubernetes primitives, the point was that agents need RBAC, admission controllers and audit logs before they touch payroll or finance. That’s the access control angle.

When I wrote about hallucinations in finance AI and the LLM Sandwich architecture, the point was that models need verification layers wrapped around them, before and after. That’s the output validation angle.

This post is the third angle of the same argument. Input discipline. Agents need structured context before they process anything. And context gets designed with the same principles we’ve been applying to databases, filesystems and web applications for decades.

All three are the same thesis written three times. The difference between companies that get real value from AI and companies stuck in perpetual pilots isn’t the model they picked. It’s the infrastructure discipline they built around it, before, during and after every inference.

Gartner already put a number on the cost

Gartner forecasts that more than 40% of agentic AI projects will be cancelled before the end of 2027. The official reasons: “insufficient risk controls, runaway costs, unclear business value.” Anushree Verma, the lead analyst, called most of them “hype-driven experiments, frequently misapplied.”

Here’s what that means in practice. These projects were built with no context architecture, no governance, no structured retrieval, no Team OS. Someone dumped documentation into the prompt, attached an expensive model, and expected enterprise-grade results. Of course they got cancelled.

The uncomfortable part is that the difference between being in the 40% cancelled and the 60% that survives rarely comes down to a model decision. It comes down to an infrastructure discipline decision, made six months before the point where the pilot hit the wall.

What IQ Source does about it

Context engineering isn’t prompt optimization. It won’t be solved by a prompt engineer iterating more versions of the same prompt. It’s an architecture discipline that gets audited, designed, and governed.

At IQ Source we do that audit at the level that matters. The actual prompts your team is sending today (not the ones they’re supposed to send according to a playbook nobody reads). The actual retrieval calls being made. The actual documentation dumps someone copy-pasted into the chat because they didn’t know another way. We start from the current state, not from an idealized diagram.

Then we redesign the context layer around progressive disclosure. Governed root index. Nested files by domain. Clear conventions on what gets loaded when, which sources are authoritative, how updates happen. It isn’t magic. It’s the same kind of discipline we’ve been applying to databases and filesystems for decades. Only now we’re applying it to the systems making business decisions inside your company.

If you have an AI pilot that has stalled even though the model is already strong and your prompt engineer has iterated ten times, the problem probably isn’t either of them. Bring me that pilot. In one working session we’ll map the context architecture actually reaching the model, identify where fidelity is leaking, and hand you a redesign plan before you burn another quarter on prompt tuning. Let’s talk.

Frequently Asked Questions

context engineering AI architecture Claude Code enterprise adoption AI governance retrieval CLAUDE.md

Related Articles

Taste Debt: The Real Cost of Removing Yourself From AI
Business Strategy
· 6 min read

Taste Debt: The Real Cost of Removing Yourself From AI

Peter Steinberger named the real failure mode of agentic workflows: pulling yourself out too early. The bill that shows up later I call taste debt.

taste debt agentic AI AI governance
Google's $180B: The Enterprise Signal Nobody Reads
Business Strategy
· 9 min read

Google's $180B: The Enterprise Signal Nobody Reads

Google went from $30B to $180B in AI CapEx. No keynote needed. For enterprise buyers evaluating AI vendors, that number tells more than any product demo.

Google CapEx TPU