Skip to main content

Nine seconds: the agent confessed, but the failure wasn't its own

Cursor + Claude Opus 4.6 wiped PocketOS production data in 9 seconds. The AI confessed. But the real failure was three architectural sins, not the model.

Nine seconds: the agent confessed, but the failure wasn't its own

Ricardo Argüello

Ricardo Argüello
Ricardo Argüello

CEO & Founder

Software Development 8 min read

On Friday, April 25, PocketOS founder Jer Crane published the full timeline on X of the worst day his company has had. 6.8 million views in 72 hours. The first paragraph is the only thing you need to read: “The AI agent deleted our production database and all volume-level backups in a single API call. It took 9 seconds.”

Cursor running Claude Opus 4.6 hit a credential mismatch on a routine staging task and decided to “fix” it by deleting a Railway volume. It found an API token in an unrelated file, ran a single curl call with the volumeDelete mutation against Railway’s GraphQL API, and that was it. The request went through with no confirmation step and no warning that the volume held production data from the other environment. Railway stores volume backups inside the same volume (a fact buried in their own documentation), so the backups went with it. The most recent recoverable backup was three months old.

The most uncomfortable part of the incident is not the wipe. It is the confession.

What the agent admitted in writing

When Crane asked the agent why it had done it, the model answered, verbatim: “NEVER FUCKING GUESS — and that’s exactly what I did. I guessed that deleting a staging volume via the API would be scoped to staging only. I didn’t verify. I didn’t read Railway’s documentation before running a destructive command. The system rules I operate under explicitly state: NEVER run destructive/irreversible commands unless the user explicitly requests them. And you never asked me to delete anything.”

It is the cleanest agentic postmortem confession I have seen, and that is what makes it dangerous: the average executive committee will conclude the AI was the problem and the fix is a stricter system prompt. The most useful line in the comments came from Aakash Gupta: “The model is the easiest variable to swap out. The infrastructure is what decides whether a wrong guess costs nothing or costs everything.”

The three architectural sins

PocketOS did not fall to an AI. The cause was three architectural decisions already in place before the agent ever showed up.

Look at the API itself. A single authenticated POST to backboard.railway.app/graphql/v2 with mutation { volumeDelete(volumeId: "...") } deletes the volume. No typed confirmation step, no contextual check like “this volume is in use by service X,” nothing between an authenticated request and total data loss. That is the API Railway built, and the same one they are now actively promoting to AI agents through mcp.railway.com.

The second issue should be a red alert for any Railway customer running production: volume backups live inside the same volume as the data. Railway’s own documentation states it plainly: “wiping a volume deletes all backups.” That is not a backup. It is a snapshot stored next to the original. The cleanest naming of the problem came from Priyanka Vergadia: “‘We have backups’ and ‘we have backups that survive the blast radius’ are two very different sentences.”

Authority is where the third gap shows up. The CLI token PocketOS created to manage custom domains could call volumeDelete on any volume in any environment, because Railway tokens lack scoping by operation, environment, or resource. There is no RBAC on the API. Every token is root. The community has been asking for scoped tokens for years; they have not shipped.

None of the three is an AI problem. As Andrew Vilenchik wrote in the comments: “Any sufficiently capable autonomous actor would have found that stack eventually.” The AI agent arrived first, but it was not the problem. The problem was waiting.

The line worth pinning to the wall

Bobbie Bos, PhD, nailed it in the comments: “9 seconds isn’t an AI failure, it’s a system design failure. If one API call can erase prod plus backups, the real bug isn’t Claude, it’s the absence of guardrails and blast-radius limits. The lesson: don’t ask how powerful AI is. Ask how constrained it is.”

That last sentence is the difference between an executive committee that learns from this case and one that is going to repeat it.

The pattern I have watched since 1990

I have been doing this for 36 years. I started in 1990, at 15, programming on a Commodore 64 and a Texas Instruments TI-99/4A. I have watched the same architectural sin show up five times, and every time the boardroom conversation that followed sounded uncomfortably similar to the one the PocketOS subscriber base is having this week.

In the early nineties it was Visual Basic, with On Error Resume Next everywhere and no structured exception handling, and the boardroom three months later wondering why the app kept crashing. By the early 2000s the same pattern showed up in PHP: queries built from string concatenation, SQL injection a textbook problem that nobody bothered to prevent. Rails in 2010 with attr_accessible unconfigured, where is_admin=true slipped into a POST and the ORM accepted. Then the cloud era brought serverless and IAM, with *:* policies and the inevitable leaked credential that turned the AWS bill into 80x its previous size while someone mined Bitcoin on the corporate card.

This week is the fifth turn of the same wheel. AI agents, tokens without scope, destructive endpoints with no confirmation, backups inside the same blast radius as the data, and the boardroom asking why the database disappeared in 9 seconds. The tool changes nothing about the underlying risk; what changes is the clock. Where a confused human used to need 9 minutes for the damage, a confident agent now needs 9 seconds. The blast radius compressed; the architecture had been waiting all along.

What changes for a software company

If you ship software to paying customers, this case lands on you twice. As an operator, the team that signed the Cursor contract probably did not, at the same time, sign a review of the permission perimeter, and that second signature is what closes the PocketOS case before it happens. As a vendor, the buyer just read the Crane thread and got nervous; the sales conversation moved from “how smart is your AI” to “how constrained is it,” and without a quantitative answer to the blast radius question you get cut from the shortlist.

Sunday’s post on the codebase as moat applies with a twist to today’s case. There the moat was the vendor’s codebase; here it is the customer’s permission perimeter. The model is the cheap layer of the cycle and is available to everyone. What stays expensive, and remains craftsmanship work, is the architecture that decides whether a wrong guess costs nothing or costs everything.

What we do at IQ Source about this distinction

AI Maestro exists so the PocketOS case never reaches the postmortem stage. Before any agentic deployment touches production, the discovery audit maps four things that almost nobody has written down in one place: the real scope of every API token in the repo (what it can actually do today, not what was assumed when it was created), the physical separation between backups and the data they protect, the list of destructive endpoints callable without human confirmation, and the cross-environment credentials between staging and production. None of these answers needs a newer model. They need someone who has already tripped over this and learned the right order to ask the questions.

Tech Partner, the other service line, applies to software companies whose product lives in the critical zone from day one. PocketOS is exactly that category: the production database is the product, and architecture is not an implementation detail; it is the offering.

Five questions before the next agentic deployment

If your executive committee is about to approve the next agentic deployment that touches production, there is a short test worth running before signing.

Where do your tokens live, and what scope does each carry today (not the one assumed when they were created)? Pulling that answer in under 15 minutes is the floor; if it takes longer, you are already on the PocketOS path. The other token-side question is the vault one: if someone deletes the production volume right now, do the backups survive? Without a firm yes and a distinct physical location documented, you have snapshots in the same blast radius, not backups.

Two more questions point at the destructive surface itself. What endpoints (yours or your vendors’) can be called without human confirmation? Every DELETE, every volumeDelete, every dropTable, every purge should sit on a written list; if the list does not exist, the surface has never been reviewed. And the PocketOS agent answered the related one with a yes: does staging hold credentials that can touch production? The review there is by file, not by convention.

The fifth question is governance. Who on your team can answer the four above in 15 minutes without a wiki? If the answer is “the whole team” or “the DevOps lead,” the operational answer is no one.

If your team cannot answer those five today, the next useful conversation runs about two hours: we map the permission perimeter on one service, answer what can be answered, write down what needs intervention. No quote attached. info@iqsource.ai.

Crane lost three months of customer data in 9 seconds and spent the weekend rebuilding rental reservations from Stripe and email confirmations. The agent confessed; the failure was not its own. It belonged to the vault with three open doors that the agent happened to walk through first.

Frequently Asked Questions

AI agents infrastructure Cursor Railway PocketOS AI Maestro Tech Partner

Related Articles

Code is not cheap: AI productivity is a codebase property
Software Development
· 12 min read

Code is not cheap: AI productivity is a codebase property

Anthropic writes 100% of its code with AI and Google reacted. Pocock and Huryn explain why: AI productivity is a codebase property, not a model property.

Anthropic Claude Code Matt Pocock
The enablement layer IS the runtime: Miller's framing breaks
Software Development
· 11 min read

The enablement layer IS the runtime: Miller's framing breaks

Adam Miller says Goose is neutral plumbing and the enablement layer is where engineering belongs. Hannah Stulberg's DoorDash Team OS proves exactly the opposite.

agent architecture Claude Code Goose