Skip to main content

The enablement layer IS the runtime: Miller's framing breaks

Adam Miller says Goose is neutral plumbing and the enablement layer is where engineering belongs. Hannah Stulberg's DoorDash Team OS proves exactly the opposite.

The enablement layer IS the runtime: Miller's framing breaks

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Software Development 11 min read

On April 15, Adam Miller — a Distinguished Engineer at Red Hat — published a LinkedIn essay telling engineers to quit rebuilding agents from scratch and compose on top of Goose instead. Jack Dorsey reposted it the next day. Miller’s thesis in one line:

For over 90% of use cases, if you’re spending time implementing an agent from scratch, you’ve already lost the race. Goose is the building block. Your advantage lives in the enablement layer around it.

Miller is right about nearly all of it. Nearly.

What Miller says and why it lands

Miller describes Goose — the open source agent runtime Block donated to the Linux Foundation — as a tiny engine with an MCP extension architecture. The core is an interactive loop plus context management. Everything else plugs in.

His argument: the 90% of teams still reimplementing tool-calling parsers, context window management, MCP client libraries and subagent orchestration are burning senior engineering on plumbing that is already solved, governed by a neutral foundation, and battle-tested. Miller cites Ralph Bean and Birgitta Böckeler: the enablement layer is the AGENTS.md files, the skills, the custom tools, the hand-crafted linters, the system prompts. That is where engineering effort belongs.

Half of the diagnosis is correct. Rebuilding an agent loop when Goose or Claude Code already solved it wastes headcount on pipes.

What stops me is the conclusion: pick your enablement layer, not your agent. That line treats the agent runtime as an interchangeable commodity. And when you look at the most advanced public configuration in the wild, the runtime underneath it is not a commodity. It never was.

Exhibit A: Hannah Stulberg’s Team OS

Aakash Gupta posted a thread yesterday distilling ten lessons from Hannah Stulberg, a PM at DoorDash and former Google APM, with more than 1,500 hours in Claude Code. The thread is the most complete public documentation of a mature enablement layer in practice today. All ten points are worth reading, but these are the ones that matter for this conversation:

Hannah’s point 4 is the one to sit with. Nested CLAUDE.md files act as routing tables. In the demo Aakash walked through, a single customer query consumed 3% of the context window because Claude skipped every subtree that was not relevant — analytics, data engineering, unrelated services — and never opened a file it did not need. The remaining 97% stayed available for reasoning.

That number is not a property of the configuration. It is a property of the runtime. If Claude Code eagerly loaded the whole repo on every turn, nested CLAUDE.md files would save nothing. The trick works because Claude Code loads context lazily through a specific hierarchical resolution path. Change the runtime and you change the outcome.

Keep going through the thread:

  • Point 3: The root CLAUDE.md is extremely lean. It indexes docs. It lists the team with Slack handles so you can write “Slack Alex about the bug from today’s customer call” and Claude uses the Slack MCP to send it. That behavior also depends on the runtime — Goose wires MCP extensions through a different discovery path and does not consume the same kind of inline roster the same way.
  • Point 7: Plan mode. Shift+Tab twice. Hannah describes it as “taking the keys away from an eager junior.” Plan mode is a Claude Code runtime feature. Goose does not have it. To replicate it you compose a recipe of subrecipes — one to plan, one to execute — and even then the main session does not shed its bias for action the same way.
  • Point 8: For long documents, Hannah splits writing across multiple subagents. Each subagent gets specific context files and writes to a temp file. Without temp files, “ten agents returning simultaneously crash the session and you lose everything.” That constraint — ten concurrent returns crashing the session — is a Claude Code runtime behavior. Goose runs subrecipes in isolated sessions with entirely different concurrency semantics.
  • Point 8 cont.: Skills auto-invoke about 70% of the time, so they have to be specified explicitly in the plan. That 70% number is Claude-Code-specific. Extension invocation in Goose has a different profile.
  • Point 9: Native plan files are wiped every 24 to 72 hours. That is why Hannah saves plans to the repo. The retention window itself is runtime-specific.

Add it up. Hannah’s Team OS is not a portable layer that happens to run on Claude Code. It is an operating system written for Claude Code, tuned against its context semantics, its subagent model, its skill isolation and its plan lifecycle. Change the runtime and the system breaks.

What actually happens when you try to port

Thought experiment. Tomorrow your CTO reads Miller and says: “Let’s move the Team OS off Claude Code onto Goose to reduce vendor lock-in.”

Start with the nested CLAUDE.md files. Goose has no equivalent routing mechanism. The closest thing is a recipe YAML with subrecipes by domain, and even that comparison falls apart the moment you look at lifecycle: Goose subrecipes run in isolated sessions that do not share history, memory or persistent state with the parent. The whole pattern — query arrives, agent routes to the right subfolder, context is preserved across steps — disappears. You have to redesign how context moves between stages.

Plan mode is gone too. To get something similar in Goose you end up composing a planner subrecipe followed by an executor subrecipe, with an explicit hand-off protocol between them. It is workable, but the main session does not shed its bias for action the same way the Claude Code plan mode does. The UX shifts, the guarantees weaken, and the team’s behavior around it drifts in ways you feel three months later.

Skills do not map cleanly either. Claude Code skills are markdown files with description-based auto-invocation running inside the same process. Goose MCP extensions are external services the agent discovers over a protocol, usually out of process, often over the network. The security boundary changes, latency grows, and the failure surface expands into territory the original design never considered. Translation is the wrong word for this work — rewriting is closer.

Then there is the folklore. “Ten concurrent subagent returns crash the session without temp files” is a Claude Code constraint Hannah discovered the hard way. Goose has its own ceilings, calibrated against a totally different concurrency architecture. The 1,500 hours Hannah spent finding the edges in Claude Code do not transfer. A new team on Goose starts the clock again from zero.

Last and often underestimated: the team’s muscle memory. Whoever knows how to drive Team OS knows how to drive Claude Code. Moving them to Goose retrains habit, editor integration and configuration at the same time. None of these breaks is catastrophic on its own. Put them together and you are looking at a redesign, not a migration.

Framework lock-in is back

For fifteen years the industry’s advice was: pick portable frameworks, abstract the vendor, minimize lock-in. React over Angular. Kubernetes over ECS. Standard SQL over dialect-specific features. The best practice was maximizing optionality.

Agent runtimes break that logic.

Agent runtimes do not behave like UI frameworks or container orchestrators. They behave closer to operating systems — with a memory model, a concurrency model, a lifecycle and an isolation layer that your configuration has to exploit in order to be fast. The useful abstractions today are protocol-level: Zed’s Agent Client Protocol and MCP both give you interoperability on a subset of operations, which is valuable, but neither delivers behavioral parity across runtimes.

Armin Ronacher wrote this in November: “the benefits do not yet outweigh the costs” of abstraction layers. His recommendation is to work directly against provider SDKs because model-runtime pairs have semantic differences that abstractions hide badly.

Miller and Ronacher are looking at the same problem from opposite ends. Miller sees a common interactive loop on the inside and concludes everything above it is swappable. Ronacher points at the seams between caching strategies, tool call formats and failure semantics, and concludes that today’s abstractions are premature. The Team OS evidence backs Ronacher, not Miller.

The actual decision

Picking an agent runtime in 2026 is a first-order architectural decision — the same weight you give to picking a database, a cloud provider or a backend language. The criteria that matter are the same you would apply anywhere else.

Start with the memory model. Does the runtime load context lazily through a hierarchy, eagerly in full, or through some hybrid resolution path? That single answer drives how large the configuration can grow before performance collapses. Concurrency is next: how many subagents can run in parallel, with what isolation between them, and what failure mode shows up when ten of them return a result at the same moment.

Artifact lifecycle is easy to miss and expensive to discover late. Plans, memories and skills have retention windows that vary between runtimes, and those windows dictate whether you persist work to the repo or trust the runtime to remember it for you. Extension surface decides what shape your configuration even takes — markdown with auto-invoke, remote MCP services, YAML recipes, native plugins — each with its own latency, security and portability trade-offs. And governance is the quietest input: Anthropic, OpenAI, Block plus the Linux Foundation and open community models each imply a different contract for roadmap stability. You are not reading terms and conditions; you are choosing who gets to change the floor you are standing on.

This is not picking an editor theme. It shapes the next two years of your team.

The moat is also not where Miller places it. It is not “in the enablement layer” — because that layer is inseparable from the runtime hosting it. It is in the pair. The 1,500 hours your team will invest learning how this specific runtime behaves under your real load is the defensible asset. That learning does not port. That is why it is a moat.

What IQ Source does about this

When a client brings us in to accelerate their agent adoption, the first conversation is not about which skills to write. It is about which runtime to run. If a runtime is already installed without an explicit decision — somebody tried Claude Code after a demo, somebody else dropped Goose in because they read Miller — that is the first thing we surface.

Before authorizing the next month of configuration investment, we walk a client through five checks.

We want a concrete picture of the real workload — not hypothetical, not aspirational, the actual work the runtime has to carry over the next six months. From there we name the specific runtime behaviors the team is banking on, by name: subagent isolation, lazy context loading, plan persistence, skill auto-invocation. Whatever the configuration plans to exploit, we write it down.

Next comes honesty about portability. How much of the configuration will remain genuinely portable across runtimes, and how much is runtime-bound once you look closely? The honest answer is usually that it is less portable than the team thinks. That sets up the exit question — not a plan to execute, but a priced estimate of what it would cost the day runtime governance changes or the vendor moves in a direction you cannot follow.

The last check is measurement. If the runtime plus configuration pair is never benchmarked against an alternative under real load, the decision to keep investing in it turns religious. We force the comparison before the next quarter of spend.

That work lives inside the AI Agent Maestro engagement. It also ties directly into the review-at-scale model we wrote about yesterday: the configuration has to produce reviewable work, and reviewability is itself a runtime property.

If you are about to spend three months of engineering on an agent enablement layer, draw which runtime it will be bolted to and what happens the day you want to change it. If the drawing is blank, the conversation is not about skills. It is about architecture.

Frequently Asked Questions

agent architecture Claude Code Goose Team OS AI agents AI governance runtime lock-in

Related Articles

Agents ship PRs while you sleep. Who reviews them?
Software Development
· 7 min read

Agents ship PRs while you sleep. Who reviews them?

Cognition paid $250M for Windsurf so Devin can open PRs while you sleep. Cursor bet the opposite. The real bottleneck is review capacity, not code.

review at scale AI agents Cognition
npm's Worst Day: One Attack, One Leak, Zero Trust
Software Development
· 9 min read

npm's Worst Day: One Attack, One Leak, Zero Trust

Axios was hijacked to deploy a RAT. Claude Code's source leaked via source maps. Same registry, same day — two failure modes your team needs to understand.

npm security supply chain attack axios