The harness is the moat: the model is now commodity
Ricardo Argüello — May 11, 2026
CEO & Founder
General summary
Box posted a job opening over the weekend that no recruiter in Latin America knows how to explain yet: AI Business Automation Engineer. It is an internal Forward Deployed Engineer who sits shoulder-to-shoulder with Finance, Legal, People, GTM, and Customer Success. Aaron Levie calls this the role that builds and maintains the harness around the AI model. Aakash Gupta named the concept the same weekend. Mnimiy measured it with data: 12 rules in a CLAUDE.md drop Claude's mistake rate from 41% to 3% across 30 codebases without changing the model. At IQ Source we run the harness internally every day and have been operating it for B2B software companies in Latin America for months.
- Aaron Levie at Box announced on Sunday that Box is hiring AI Business Automation Engineers — an internal Forward Deployed Engineer role embedded with Finance, Legal, People, GTM, and Customer Success
- Aakash Gupta gave the concept a concrete name: the harness. The dominant AI products of 2026 all run on the same three frontier models; what differentiates them is the harness around the model, not the model
- Mnimiy's analysis crossed 2.5 million views: Forrest Chang's 4-rule CLAUDE.md cut Claude's mistake rate from 41% to 11%; adding 8 more rules dropped it to 3% across 30 codebases over 6 weeks — no model change
- The backdrop is measured: RAND reports 80% of AI projects fail; MIT NANDA finds only 5% of GenAI pilots extract major value; S&P Global tracks 42% of companies abandoning most AI initiatives
- At IQ Source we run the harness internally every day (the brain), install it for client teams as AI Maestro, and operate it inside the development flow as Tech Partner
Picture two Formula 1 garages receiving the same V6 turbo engine in the same crate on the same day. One garage has mechanics who have worked together for fifteen years, calibrated tools, checklists written in blood, and a pit chief who calls the tire change the moment it starts raining. The other garage has three interns with PDF manuals. Same engine. Lap times that diverge by minutes. The engine is the AI model. Everything else is the harness.
AI-generated summary
There is a new job opening at Box that no recruiter in Latin America knows how to explain yet: AI Business Automation Engineer. The literal description, in Aaron Levie’s words: “sit shoulder-to-shoulder with the people doing the work, deeply understand their workflows, and then design, prototype, and deliver the technical solutions that transform how those teams operate.” One week embedding an agent in Finance close. The next, redesigning Legal intake as an agentic system. The next, standing up a data flow so GTM can automate customer intelligence. Levie closes saying most companies will end up with several flavors of this role. It is the role that builds and maintains a new piece that no lab can ship to you packaged.
That piece has a name. Aakash Gupta named it the same weekend: the harness. The operating layer wrapped around the AI model (context engineering, skills, evals, orchestration, operating discipline) and where all the 2026 competitive advantage that cannot be copied now lives. IQ Source’s thesis here is direct: if your company is going to win something with AI in the next twelve months, you are not going to win it by picking the right model. You are going to win it by building, and maintaining every day, the harness around the model. Box is hiring full-time to build it internally. IQ Source has been operating it as an outside service for B2B companies in Latin America for months.
The model is a commodity. The harness is the company.
Cursor, Devin, Replit, and Windsurf all run on the same three frontier models (Claude, GPT-5, Gemini). Swap the model and the products keep working, with almost no loss in quality. Swap the harness and they break. That is where the moat lives in 2026. Nothing about the harness gets cheaper when models get cheaper. Nothing about it improves when models improve. All of it has to be earned. The advantage of companies that have been building it for months is measured in real iteration against real customers, not in subscriptions to a new model.
I have been in computing for 36 years, since 1990, when I was 15, on a Commodore 64. This pattern repeats every time a new platform arrives. The relational database commoditized in the nineties and the moat moved to the ORM. The server commoditized with the cloud and the moat moved to deployment and observability. What is different with AI is the velocity. The compressed layer this time is general reasoning, and the layer right above it — the operating layer, the one fitted to your specific company — is where the value that cannot be copied is now landing.
The quantitative proof was published by Mnimiy in an analysis that crossed 2.5 million views over the weekend. Forrest Chang built a 4-rule CLAUDE.md from the complaint thread Andrej Karpathy posted in January; the repo hit 120,000 GitHub stars. Mnimiy ran it across 30 codebases for 6 weeks: Claude’s mistake rate fell from 41% to 11% with the four rules, and to 3% after eight more were added (token budgets, tests that verify intent, checkpoints, fail-loud reporting). Same Claude. A text file under two hundred lines was the entire delta.
The background math is published. RAND reports more than 80% of AI projects fail, twice the rate of non-AI IT. MIT NANDA measured that only 5% of enterprise GenAI pilots extract major value. S&P Global tracked 42% of companies abandoning most AI initiatives. The difference between the 5% extracting value and the 95% not extracting it is not the model. It is what is wrapped around the model. Steve Nouri put it as a test you can run on any pilot you have alive today: turn it off. If nobody notices this week, it was not transformation. It was theater.
What the harness is, concretely
The harness is not abstract. It has five specific pieces, each with owners, maintenance, and cost.
Context engineering. What tokens the model sees on every call, what loads up front, what gets fetched on demand. We covered the concrete practice in an earlier post: redesigning the context for a support agent dropped active tokens from ~45K to ~12K and lifted task accuracy from ~72% to ~91%, with no model change.
Skills. Reusable files the agent executes when a specific trigger fires. A skill that takes a call transcript and extracts open items. A skill that prepares a proposal from five comparable projects. The unit of reuse: every new client does not start from zero.
Evals. How you measure that the agent actually got it right. If your quality metric is “the output looks reasonable,” you do not have evals — you have opinions. Evals are case sets with expected answers, run after every change.
Orchestration. When to spawn a subagent with a clean window, when to parallelize, when to escalate to a human, when to accept a partial answer. The logic that composes a response when the problem does not fit in a single call.
Operating discipline. The rule most Team OS implementations die from: the feature does not ship until the harness repo is updated. Without this rule, the other four layers fill with dust in six weeks.
All five are fitted to your company. Box’s financial close does not look like your B2B software company’s close. That is exactly why the harness is where advantage accumulates: every company ends up with its own version, and the version is not portable.
What we already run at IQ Source
The IQ Source brain is our team’s harness, tuned every day for the last month with two active clients running in parallel. Layer 1: call transcripts get ingested, open items get extracted and propagated to projects. Layer 2: anyone on the team asks the brain in natural language (“what did we quote in February to a company this size?”, “what was decided on Monday’s call with prospect X?”). Layer 3: the operating rule. A proposal does not get quoted if the prospect’s information is not where the brain finds it; a handoff between two team members only counts when what got transferred is in writing.
When a client hires us for AI Maestro, we do not install anything yet. Over two months of consulting, education, and training, we map how the company actually operates (not what the manual says), train the team in AI fluency, and identify which of the five harness components is the real opportunity. The stage closes with three deliverables (a Process Reality Map, an AI Opportunity Score, and a prioritized recommendation) and a Go/No-Go gate. If it advances, a second stage designs and implements the recommended solution (an agent when it fits, automation or integration when not) and operates it until the client’s team takes over. If it does not advance, the deliverables are theirs with zero downstream commitment. It is the outside version of the internal role Levie is hiring for: the outsider with judgment who diagnoses before touching anything.
When the client also needs us to keep building product while this gets installed, that is where Tech Partner comes in. The harness operates inside the development process: a PR does not close unless the technical decision that motivated it is written down in the repo; a release carries an attached log the next team can read without reconstructing the why. It is the workflow moat I covered in April, operational.
The three tests for your harness
The first is the existence test. What is the harness your team uses today, and where does it live? If the answer is “in someone’s head” or “in Slack,” the harness does not exist as a system. It exists as individuals. And individuals leave.
The second is the ownership test. Who, by name, keeps the harness alive? Not “everyone.” Not “the AI team.” A specific person, with allocated time, with authority to send a feature back to the backlog if the corresponding documentation did not get updated. Without that role, the other four layers atrophy.
The third is the agent test. When an AI agent inside your company tries to answer the next customer question, what it finds inside the harness defines the answer. Not the model. Not the version. What your team wrote down this week in the place the agent goes looking.
If all three answers are solid, you do not need IQ Source for this. If two or three land in silence, write to us and we look at it together: an initial conversation serves one purpose, to diagnose which of the five components is the real bottleneck.
The team that earned the harness before yours pays one year in 2027. The team still believing the problem is picking the right model pays three, and finds out late which one was the moat.
Frequently Asked Questions
The harness is the operating layer wrapped around the language model: context engineering, skills, evals, orchestration, and operating discipline. In 2026 the frontier models (Claude, GPT-5, Gemini) are interchangeable; the harness is built inside each company against its real workflows, is not for sale, and is where competitive advantage now lives.
It is an internal Forward Deployed Engineer who sits shoulder-to-shoulder with Finance, Legal, People, GTM, or Customer Success, identifies processes ripe for redesign with AI, prototypes the technical solution, and ships it into production. Aaron Levie announced the role at Box on May 10, 2026 as the pattern that will dominate the next wave of IT hiring at mid-to-large companies.
On May 9, 2026, Mnimiy published an analysis with 2.5 million views: Forrest Chang's 4-rule CLAUDE.md, written from Andrej Karpathy's January thread, cut Claude's mistake rate from 41% to 11%. Adding 8 more rules brought it down to 3% across 30 codebases over 6 weeks. Same model, different harness, 14x improvement.
AI Maestro is the service where IQ Source installs the harness layers for the client's team: it governs context engineering, writes skills, measures evals, runs orchestration, and maintains discipline while the client's team is busy putting out product fires. Tech Partner is the variant that also builds product inside the same flow.
Related Articles
The team whose reasoning is searchable
Aakash Gupta named the system this week that we already run inside IQ Source: three layers that make a team's past reasoning queryable in 15 seconds.
Karpathy Described a Vault. We Built a Nervous System.
Karpathy's LLM Wiki hit 17M+ views describing memory. We have spent 18 days running the four layers around it: senses, nerves, immune system.