Your AI Bottleneck Isn't the Model. It's the Context.
Ricardo Argüello — April 11, 2026
CEO & Founder
General summary
The reason your enterprise AI pilot is stalling probably isn't the model or the prompt engineer. It's the context architecture. Aakash Gupta ran 1,500 hours inside Claude Code before publishing a counterintuitive rule: the teams getting mediocre output are the ones that left the model no thinking room. The fix is old and borrowed from distributed systems.
- A 1 million token context window sounds infinite, until a real company's PRDs, runbooks and compliance policies fill it — and when it fills, fidelity drops
- Liu et al. 2023 (Stanford/UC Berkeley) proved that LLMs degrade reasoning when relevant information is buried in the middle of long contexts — the 'lost in the middle' effect
- Progressive disclosure is the pattern databases, filesystems and the web solved decades ago: a lean root index plus nested files loaded on demand
- Aakash calls it Team OS — not a prompt trick, but a governed retrieval layer at the organizational level
- Gartner predicts 40% of agentic AI projects will be cancelled before 2027 for insufficient risk controls, and context discipline is one of them
Picture hiring an expert auditor and handing them forty unsorted filing cabinets to find a specific receipt. It doesn't matter how experienced they are. Without an index or a structure, they'll guess or give up. That's how most companies treat their AI models. They dump whole repositories into the prompt and expect perfect reasoning. The fix isn't a smarter auditor. It's organizing the filing cabinets first.
AI-generated summary
Aakash Gupta spent 1,500 hours building inside Claude Code before he posted something that runs against the instinct of almost every team currently experimenting with enterprise AI:
“The reason most teams get mediocre outputs from Claude Code has nothing to do with the model. It has everything to do with how much room they left it to think. CLAUDE.md files should be almost empty.”
Instinct says the opposite. Give the model more information and it answers better. Give it the full documentation, the full handbook, the full runbooks and policies, and it now has everything it needs. A bigger context window equals a smarter model.
None of that is true. And most of the enterprise AI pilots stalling right now are stalling for exactly this reason, not because the model is wrong.
The math of lossy compression
A 1 million token context window sounds infinite. Anthropic shipped Claude Sonnet with 1 million token support, and Opus 4.6 runs in that extended window too. Roughly seven novels of text, all available in a single call.
Sounds like the problem is solved. It isn’t.
A real company’s documents fill that window quickly. Run the exercise on your own stack. This quarter’s PRDs. The onboarding handbook. The operations runbooks. The compliance policies. The last six months of sales notes. Open support tickets. The architecture documented in Confluence. Active vendor contracts. Seven novels evaporate without effort.
And here’s the part almost nobody audits: when the context window fills past a certain point, the model starts performing worse. Not because “it won’t fit.” Because attention inside the window isn’t distributed evenly.
The paper most CTOs haven’t read
In 2023, Liu et al. published “Lost in the Middle: How Language Models Use Long Contexts” (Stanford, UC Berkeley, Samaya AI, with Percy Liang among the authors). The paper measured something concrete: how does an LLM’s ability to answer questions change depending on where the relevant information sits inside the context?
The finding is uncomfortable. Models are significantly better at using information placed at the beginning or the end of the context than information buried in the middle. The accuracy curve is U-shaped. Stuff 500 pages of documentation into the prompt and critical information in the middle gets functionally lost, even though it technically lives inside the window.
The literature has only grown since. The phenomenon has picked up several names — “lost in the middle,” “context rot,” “long-context attention decay” — but they all describe the same thing. Fitting is not reasoning.
And this is the design error that explains why your AI pilot hit a wall. The team loaded the whole repo into the prompt, loaded the whole wiki, loaded the whole knowledge base, and assumed that with enough context the model would find the answer. But the context they loaded is exactly what degraded the model.
Bigger windows aren’t saving you
The industry has spent two years solving the wrong problem. Every new model release brags about the context window as the headline metric: 128K, 200K, 1M. And every internal product team reads those announcements as if they were the fix for their stalled pilots. “Now it’ll work. Now it all fits.”
It won’t. “Fit” and “reasoning” are two different things the industry keeps collapsing into one.
This is the same pattern I wrote about in the enterprise AI activation gap. Sixty percent of organizations have access to AI tools, but only thirty-four percent actually transform anything with them. The difference wasn’t the model. It was everything around the model.
Same read here. Your problem probably isn’t Claude, GPT or Gemini. Your problem is what you’re feeding them, and in what order.
Progressive disclosure: the pattern was already invented
The fix isn’t new. Engineers who have been building systems for thirty years already solved this several times, in different domains.
Databases don’t load entire tables into memory every time you run a query. They use indexes, resolve paths, and load only the rows you need. If every SELECT read the whole disk, Oracle would have died in the nineties.
Filesystems don’t expose the content of the whole disk when you open a folder. They use hierarchies. You walk to the directory you care about and load that listing only.
Modern web applications don’t ship all of the HTML, JavaScript and CSS for the entire site when you hit the landing page. They lazy load. Routes load when you visit them.
The pattern has had a name in interaction design since the eighties: progressive disclosure. Show the minimum the decision-maker needs, and let them go deeper on demand.
Applied to AI it’s direct. A short root file that always loads, with pointers to where each thing lives. Nested files per folder acting as topic indexes. The model reads the index, walks to the exact file it needs for the query, and loads only that. No tokens wasted on irrelevant documentation. No explore agents scanning the whole repo blind. The model’s reasoning room stays intact.
Aakash tested it literally with lean CLAUDE.md files and nested index files by folder. I’ve watched the same thing work at clients where the pain was “the model is hallucinating last quarter’s numbers” and the actual cause was that the team had been pasting six months of reports into the context and hoping the model would know which one was the latest.
From PM trick to enterprise Team OS
And this is where the topic stops being a prompt tweak and turns into organizational architecture.
At the level of a solo PM working with Claude Code, “progressive disclosure” looks like a ten-line CLAUDE.md and five index files. It’s personal. It fits in one person’s head.
At the level of a fifty-person company where half the org is touching AI tools, the same principle looks completely different. It looks like a governed retrieval layer. Who owns each index. Which documents are authoritative vs. archival. Who approves changes to the root index. How it’s versioned. What confidence level each source carries. It’s information infrastructure. It isn’t a prompt engineering config.
Aakash calls it Team OS. I think the name is right. It captures that this isn’t a tool or a workflow. It’s an operating system for how the organization delivers context to its AI models.
Without that Team OS, every team improvises. One analyst pastes whole reports into the chat because no index exists. One engineer dumps the entire repo into the agent because there’s no convention. One ops director keeps a personal doc folder because the wiki is out of date. All of it degrades the model at the exact moment you’re using it for decisions that matter.
The discipline connects to everything else
This is the same argument I’ve been making for months from different angles.
When I wrote about governing AI agents with Kubernetes primitives, the point was that agents need RBAC, admission controllers and audit logs before they touch payroll or finance. That’s the access control angle.
When I wrote about hallucinations in finance AI and the LLM Sandwich architecture, the point was that models need verification layers wrapped around them, before and after. That’s the output validation angle.
This post is the third angle of the same argument. Input discipline. Agents need structured context before they process anything. And context gets designed with the same principles we’ve been applying to databases, filesystems and web applications for decades.
All three are the same thesis written three times. The difference between companies that get real value from AI and companies stuck in perpetual pilots isn’t the model they picked. It’s the infrastructure discipline they built around it, before, during and after every inference.
Gartner already put a number on the cost
Gartner forecasts that more than 40% of agentic AI projects will be cancelled before the end of 2027. The official reasons: “insufficient risk controls, runaway costs, unclear business value.” Anushree Verma, the lead analyst, called most of them “hype-driven experiments, frequently misapplied.”
Here’s what that means in practice. These projects were built with no context architecture, no governance, no structured retrieval, no Team OS. Someone dumped documentation into the prompt, attached an expensive model, and expected enterprise-grade results. Of course they got cancelled.
The uncomfortable part is that the difference between being in the 40% cancelled and the 60% that survives rarely comes down to a model decision. It comes down to an infrastructure discipline decision, made six months before the point where the pilot hit the wall.
What IQ Source does about it
Context engineering isn’t prompt optimization. It won’t be solved by a prompt engineer iterating more versions of the same prompt. It’s an architecture discipline that gets audited, designed, and governed.
At IQ Source we do that audit at the level that matters. The actual prompts your team is sending today (not the ones they’re supposed to send according to a playbook nobody reads). The actual retrieval calls being made. The actual documentation dumps someone copy-pasted into the chat because they didn’t know another way. We start from the current state, not from an idealized diagram.
Then we redesign the context layer around progressive disclosure. Governed root index. Nested files by domain. Clear conventions on what gets loaded when, which sources are authoritative, how updates happen. It isn’t magic. It’s the same kind of discipline we’ve been applying to databases and filesystems for decades. Only now we’re applying it to the systems making business decisions inside your company.
If you have an AI pilot that has stalled even though the model is already strong and your prompt engineer has iterated ten times, the problem probably isn’t either of them. Bring me that pilot. In one working session we’ll map the context architecture actually reaching the model, identify where fidelity is leaking, and hand you a redesign plan before you burn another quarter on prompt tuning. Let’s talk.
Frequently Asked Questions
Context engineering is the discipline of designing what information an AI model receives, in what order, and with what structure. It's the bottleneck because most enterprise pilots dump entire repositories and unordered documents into the prompt, saturating the context window and degrading the model's reasoning capacity even when the text technically fits.
Claude Sonnet and Opus 4.6 support a 1 million token window, letting more text fit, but fitting isn't reasoning. The Lost in the Middle paper showed LLMs lose accuracy when key information is buried mid-context. A bigger window doesn't help if the information the model needs is lost in noise the model can't attend to well.
Progressive disclosure is a pattern where the AI model loads a lean root file with pointers to where each document lives, then navigates to load only the specific content needed for a query. It's the same principle databases solved with indexes and filesystems solved with hierarchies: you load what resolves the question, not everything.
A CTO should review the actual prompts teams are sending, the real retrieval calls being made, and the documents being copy-pasted into context. The audit identifies where windows get saturated, which information is redundant, and where structure is missing. Then the layer gets redesigned around a governed root index and on-demand loading.
Related Articles
Taste Debt: The Real Cost of Removing Yourself From AI
Peter Steinberger named the real failure mode of agentic workflows: pulling yourself out too early. The bill that shows up later I call taste debt.
Google's $180B: The Enterprise Signal Nobody Reads
Google went from $30B to $180B in AI CapEx. No keynote needed. For enterprise buyers evaluating AI vendors, that number tells more than any product demo.