Skip to main content

The harness is the moat: the model is now commodity

Cursor, Devin, and Replit run the same three frontier models. Swap the model and the products keep working. Swap the harness and they break.

The harness is the moat: the model is now commodity

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

AI & Automation 8 min read

There is a new job opening at Box that no recruiter in Latin America knows how to explain yet: AI Business Automation Engineer. The literal description, in Aaron Levie’s words: “sit shoulder-to-shoulder with the people doing the work, deeply understand their workflows, and then design, prototype, and deliver the technical solutions that transform how those teams operate.” One week embedding an agent in Finance close. The next, redesigning Legal intake as an agentic system. The next, standing up a data flow so GTM can automate customer intelligence. Levie closes saying most companies will end up with several flavors of this role. It is the role that builds and maintains a new piece that no lab can ship to you packaged.

That piece has a name. Aakash Gupta named it the same weekend: the harness. The operating layer wrapped around the AI model (context engineering, skills, evals, orchestration, operating discipline) and where all the 2026 competitive advantage that cannot be copied now lives. IQ Source’s thesis here is direct: if your company is going to win something with AI in the next twelve months, you are not going to win it by picking the right model. You are going to win it by building, and maintaining every day, the harness around the model. Box is hiring full-time to build it internally. IQ Source has been operating it as an outside service for B2B companies in Latin America for months.

The model is a commodity. The harness is the company.

Cursor, Devin, Replit, and Windsurf all run on the same three frontier models (Claude, GPT-5, Gemini). Swap the model and the products keep working, with almost no loss in quality. Swap the harness and they break. That is where the moat lives in 2026. Nothing about the harness gets cheaper when models get cheaper. Nothing about it improves when models improve. All of it has to be earned. The advantage of companies that have been building it for months is measured in real iteration against real customers, not in subscriptions to a new model.

I have been in computing for 36 years, since 1990, when I was 15, on a Commodore 64. This pattern repeats every time a new platform arrives. The relational database commoditized in the nineties and the moat moved to the ORM. The server commoditized with the cloud and the moat moved to deployment and observability. What is different with AI is the velocity. The compressed layer this time is general reasoning, and the layer right above it — the operating layer, the one fitted to your specific company — is where the value that cannot be copied is now landing.

The quantitative proof was published by Mnimiy in an analysis that crossed 2.5 million views over the weekend. Forrest Chang built a 4-rule CLAUDE.md from the complaint thread Andrej Karpathy posted in January; the repo hit 120,000 GitHub stars. Mnimiy ran it across 30 codebases for 6 weeks: Claude’s mistake rate fell from 41% to 11% with the four rules, and to 3% after eight more were added (token budgets, tests that verify intent, checkpoints, fail-loud reporting). Same Claude. A text file under two hundred lines was the entire delta.

The background math is published. RAND reports more than 80% of AI projects fail, twice the rate of non-AI IT. MIT NANDA measured that only 5% of enterprise GenAI pilots extract major value. S&P Global tracked 42% of companies abandoning most AI initiatives. The difference between the 5% extracting value and the 95% not extracting it is not the model. It is what is wrapped around the model. Steve Nouri put it as a test you can run on any pilot you have alive today: turn it off. If nobody notices this week, it was not transformation. It was theater.

What the harness is, concretely

The harness is not abstract. It has five specific pieces, each with owners, maintenance, and cost.

Context engineering. What tokens the model sees on every call, what loads up front, what gets fetched on demand. We covered the concrete practice in an earlier post: redesigning the context for a support agent dropped active tokens from ~45K to ~12K and lifted task accuracy from ~72% to ~91%, with no model change.

Skills. Reusable files the agent executes when a specific trigger fires. A skill that takes a call transcript and extracts open items. A skill that prepares a proposal from five comparable projects. The unit of reuse: every new client does not start from zero.

Evals. How you measure that the agent actually got it right. If your quality metric is “the output looks reasonable,” you do not have evals — you have opinions. Evals are case sets with expected answers, run after every change.

Orchestration. When to spawn a subagent with a clean window, when to parallelize, when to escalate to a human, when to accept a partial answer. The logic that composes a response when the problem does not fit in a single call.

Operating discipline. The rule most Team OS implementations die from: the feature does not ship until the harness repo is updated. Without this rule, the other four layers fill with dust in six weeks.

All five are fitted to your company. Box’s financial close does not look like your B2B software company’s close. That is exactly why the harness is where advantage accumulates: every company ends up with its own version, and the version is not portable.

What we already run at IQ Source

The IQ Source brain is our team’s harness, tuned every day for the last month with two active clients running in parallel. Layer 1: call transcripts get ingested, open items get extracted and propagated to projects. Layer 2: anyone on the team asks the brain in natural language (“what did we quote in February to a company this size?”, “what was decided on Monday’s call with prospect X?”). Layer 3: the operating rule. A proposal does not get quoted if the prospect’s information is not where the brain finds it; a handoff between two team members only counts when what got transferred is in writing.

When a client hires us for AI Maestro, we do not install anything yet. Over two months of consulting, education, and training, we map how the company actually operates (not what the manual says), train the team in AI fluency, and identify which of the five harness components is the real opportunity. The stage closes with three deliverables (a Process Reality Map, an AI Opportunity Score, and a prioritized recommendation) and a Go/No-Go gate. If it advances, a second stage designs and implements the recommended solution (an agent when it fits, automation or integration when not) and operates it until the client’s team takes over. If it does not advance, the deliverables are theirs with zero downstream commitment. It is the outside version of the internal role Levie is hiring for: the outsider with judgment who diagnoses before touching anything.

When the client also needs us to keep building product while this gets installed, that is where Tech Partner comes in. The harness operates inside the development process: a PR does not close unless the technical decision that motivated it is written down in the repo; a release carries an attached log the next team can read without reconstructing the why. It is the workflow moat I covered in April, operational.

The three tests for your harness

The first is the existence test. What is the harness your team uses today, and where does it live? If the answer is “in someone’s head” or “in Slack,” the harness does not exist as a system. It exists as individuals. And individuals leave.

The second is the ownership test. Who, by name, keeps the harness alive? Not “everyone.” Not “the AI team.” A specific person, with allocated time, with authority to send a feature back to the backlog if the corresponding documentation did not get updated. Without that role, the other four layers atrophy.

The third is the agent test. When an AI agent inside your company tries to answer the next customer question, what it finds inside the harness defines the answer. Not the model. Not the version. What your team wrote down this week in the place the agent goes looking.

If all three answers are solid, you do not need IQ Source for this. If two or three land in silence, write to us and we look at it together: an initial conversation serves one purpose, to diagnose which of the five components is the real bottleneck.

The team that earned the harness before yours pays one year in 2027. The team still believing the problem is picking the right model pays three, and finds out late which one was the moat.

Frequently Asked Questions

harness context engineering Aakash Gupta Aaron Levie AI Maestro Tech Partner moat

Related Articles

The team whose reasoning is searchable
AI & Automation
· 12 min read

The team whose reasoning is searchable

Aakash Gupta named the system this week that we already run inside IQ Source: three layers that make a team's past reasoning queryable in 15 seconds.

Team OS team memory Aakash Gupta
Karpathy Described a Vault. We Built a Nervous System.
AI & Automation
· 11 min read

Karpathy Described a Vault. We Built a Nervous System.

Karpathy's LLM Wiki hit 17M+ views describing memory. We have spent 18 days running the four layers around it: senses, nerves, immune system.

second brain Karpathy knowledge management