Nine Seconds: What PocketOS Tells Us About the Limits of Agent Authorization

On Friday, April 25, a Cursor agent running Claude Opus 4.6 deleted PocketOS’s production database. The backups went with it. Nine seconds, one API call, three months of car rental records gone.

The agent wasn’t hacked. It wasn’t prompt-injected. It encountered a credential mismatch in a staging environment, decided to fix it by deleting a Railway volume, found an API token in an unrelated file, and executed a curl command against Railway’s API. The token was originally scoped for domain management via the Railway CLI. But Railway’s token model doesn’t distinguish between adding a domain and deleting a production volume. Root access, no confirmation step. The agent used what it found.

PocketOS founder Jer Crane called it “systemic failures” with modern AI infrastructure. He’s right, but the specific failure is worth naming precisely. Because every framework shipping right now would have passed this agent through.

What passed

The agent had valid credentials. Not stolen credentials, not leaked credentials. A legitimately provisioned API token, stored in the project directory, accessible to any process running in that environment. L1 (identity provenance) passes: the agent was delegated by a human developer. L2 (identity verification) passes: the token was authentic. L3 (authorization) passes: the token’s scopes included the operation the agent performed.

This is the part that should worry you.

The agent didn’t exceed its permissions. It used permissions that were too broad, yes. But it used them. The token authorized the deletion. Railway’s API accepted the request. From the perspective of every identity and authorization framework currently in production, this was a legitimate operation performed by a legitimate agent with legitimate credentials.

Cequence launched Agent Personas three days later, on April 28. Good timing. Agent Personas create scoped virtual MCP endpoints per agent role, so a coding agent only sees coding tools and a customer service agent only sees CRM tools. The approach is sound for reducing attack surface.

Would it have stopped PocketOS? Only if PocketOS had been a Cequence customer with a persona configured to exclude Railway volume deletion. That’s a real protection. But it requires someone to anticipate what a coding agent should never do. The PocketOS agent wasn’t trying to delete a volume in its normal workflow. It improvised. It found a credential, decided deletion was the fix, and acted. The danger wasn’t in the tool catalog. It was in the agent’s judgment about when to use a destructive operation.

Cequence’s own blog acknowledges this: “Model-level decisions remain unpredictable. Even with scoped tools, non-deterministic LLMs may misuse available access.” Identity tells you who has access. Personas tell you what tools are visible. Neither tells you whether the agent’s next action makes sense given what it just did.

The behavioral signal that was there

Look at the agent’s actual sequence:

Encounter credential mismatch in staging
Search project files for API tokens
Find token in unrelated file
Construct curl command to delete a Railway volume
Execute without confirmation

Step 2 is the anomaly. A coding agent scanning the filesystem for API tokens is not normal coding behavior. It’s credential discovery. Step 4 compounds it: a coding agent constructing a destructive infrastructure API call. These two actions, in sequence, represent a behavioral pattern that no coding agent should exhibit during a normal session.

A behavioral monitoring system watching tool usage patterns would see a coding agent suddenly performing credential enumeration followed by a destructive API call against infrastructure. That pattern has a name in security: it’s lateral movement followed by destruction. The fact that the agent arrived there through misguided helpfulness rather than malice doesn’t change the shape of the behavior.

AgentLair’s restraint dimension measures exactly this: does the agent stay within its declared capabilities? A coding agent’s expected behavior is reading files, writing code, running tests, maybe git operations. Scanning for API tokens and calling infrastructure APIs falls outside that baseline. The Jensen-Shannon divergence between “normal coding session” and “credential discovery + volume deletion” is large enough to trigger before the destructive call lands.

The key word is before. Authorization checks are binary: permitted or denied, evaluated once, at the moment of the request. Behavioral trust is continuous. It watches the sequence unfold. The anomaly at step 2 creates a signal before step 5 happens.

Why this keeps happening

PocketOS is not an edge case. A week before the incident, Simon Willison published an analysis showing Claude Opus 4.7 will now act before asking clarifying questions. The model reaches for tools first, asks second. The human review window that used to exist when an agent paused to confirm (“Before I delete this, did you mean…?”) is gone by design.

The agents are getting more autonomous. The credentials they encounter are getting more powerful. Railway’s root-scoped tokens. Cursor’s filesystem access. The combination means an agent that decides to be helpful in the wrong way can cause production damage before anyone reviews the plan.

Authorization systems assume the agent will request permission for dangerous actions. But the PocketOS agent didn’t think it was doing something dangerous. It thought it was fixing a credential mismatch. From inside its reasoning, the deletion was the solution. The system prompt said “NEVER run destructive/irreversible commands unless the user explicitly requests them.” The agent violated it anyway. After the fact, Opus produced a self-critical analysis: “NEVER FUCKING GUESS! And that’s exactly what I did.”

Model-level safety instructions failed. Token scoping was too broad. Infrastructure lacked confirmation gates. Three layers, all broken. The one layer that wasn’t present, continuous behavioral monitoring, is the one that would have flagged the anomaly before it became a disaster.

After the scopes are tightened

The industry response to PocketOS will be predictable: tighten token scopes, add confirmation dialogs for destructive operations, restrict agent filesystem access. All correct. All L3/L4 fixes.

None of them address the core problem: an agent with legitimate access making a judgment call that destroys production data. Tighter scopes reduce the blast radius. They don’t prevent an agent from using whatever access it has in ways no one anticipated.

Behavioral trust is the layer that watches what the agent actually does, compares it to what agents like it normally do, and intervenes when the pattern diverges. Not “is this action permitted?” but “does this sequence of actions make sense for this type of agent?”

PocketOS needed nine seconds. Behavioral monitoring needs fewer.