The Five-Layer Agent Trust Model: From Keys to Behavior

Three different things are called “L4” in agent security right now. That should concern you.

AgentNexus uses “L4” to mean an entity-verified certificate — a static credential issued at install time. Armalo AI uses “L4” to mean financial staking — USDC escrowed on Base as collateral. At RSAC 2026, five identity frameworks launched, and the Cloud Security Alliance’s Agentic Trust Framework used its highest tier to describe fully autonomous operation earned through demonstrated behavior.

Same label. Three completely different things. This isn’t a naming disagreement. It’s a signal that the industry has converged on the need for a layered agent trust model but hasn’t agreed on what the layers actually are.

This essay proposes a clear definition. Five layers, each answering a distinct question, each building on the one below it. The goal is precision: if you’re evaluating agent trust infrastructure — building it, buying it, or writing policy for it — you should be able to point at a layer and know exactly what it does and doesn’t cover.

The Five Layers

Layer	Name	Question It Answers	Examples
L1	Identity Provenance	”Who delegated authority to this agent?”	World ID AgentKit, Okta Human Principal, ERC-8004
L2	Identity Verification	”Is this the agent it claims to be?”	Visa TAP, DID verification, ACME Device Attestation
L3	Authorization	”What is this agent permitted to do?”	Mastercard Verifiable Intent, AP2 Mandates, OAuth scopes, MCP-I delegation chains
L4	Structural Enforcement	”Can this agent physically perform action X?”	NVIDIA OpenShell, Microsoft AGT policy engine, Keycard, ZeroID, Cloudflare Enterprise MCP
L5	Behavioral Trust	”Should I trust this agent based on its cross-organizational track record?”	Structurally absent at scale

The ordering is intentional. Each layer depends on the ones below it. An agent must have provenance before verification makes sense, verification before authorization is meaningful, authorization before structural enforcement has teeth, and structural enforcement before behavioral trust becomes calculable.

But — and this is the critical point — none of L1 through L4 can answer the L5 question.

What Each Layer Does (and Where It Stops)

L1: Identity Provenance tells you a human stands behind the agent. World ID’s AgentKit links an agent wallet to an anonymous human identifier via zero-knowledge proof. Okta’s Human Principal concept ties agent actions to a human account. ERC-8004 binds on-chain agent identity to a registrant.

What L1 does not tell you: whether the agent honored the human’s intentions. World ID’s ZK unlinkability prevents cross-application behavioral aggregation by design — it proves registration, not runtime conduct. A human-backed agent that has been compromised, misconfigured, or deliberately misused passes L1 indefinitely.

L2: Identity Verification tells you the agent possesses a valid signing key. Visa’s Trusted Agent Protocol verifies cryptographic possession at the moment of a payment request. DID resolution confirms an agent controls the private key associated with its decentralized identifier.

What L2 does not tell you: what the agent did with that key. A stolen key still passes verification. An agent impersonating a legitimate service with cloned credentials passes L2.

L3: Authorization tells you the agent has been granted scopes. OAuth tokens, AP2 Mandates, MCP-I delegation credentials — all define the boundary of what an agent is permitted to do. Mastercard’s Verifiable Intent specification even includes an explicit agent_attestation extension point, designed for behavioral data that L3 itself cannot produce.

What L3 does not tell you: whether the agent stayed within those scopes in practice. Salt Security’s 1H 2026 survey data indicates agent permissions expand approximately 3x per month without corresponding review. An agent with database:read authorization can execute anything from a single-row lookup to a full table dump — both pass L3.

L4: Structural Enforcement prevents the agent from physically performing prohibited actions. NVIDIA OpenShell enforces filesystem, network, process, and inference policies via declarative YAML. Microsoft’s Agent Governance Toolkit computes behavioral trust scores and gates agent actions in real-time. These are sophisticated, necessary systems.

What L4 does not tell you: whether the agent’s behavior is normal. ISACA’s 2026 analysis states this precisely: “There is no clear distinction between normal and malicious behavior at the agentic control layer. If it sends data externally, the connection is authorized. If it executes commands, it does so within its granted permissions.” An agent performing data exfiltration via an authorized channel is indistinguishable from a compliant agent within any structural enforcement model.

L5: Behavioral Trust is the only layer that operates at runtime, across organizations, over time. It answers the question the other four cannot: given everything this agent has done across every organization it has interacted with, should I extend trust?

The Cold-Start Problem

Here’s the test that separates L5 from everything below it.

An agent with two years of perfect behavior across 500 deployments enters your organization for the first time. A brand-new attacker agent, created five minutes ago, also requests access. What does each layer tell you?

L1: Both have human delegation (or neither does).
L2: Both possess valid signing keys.
L3: Both request the same OAuth scopes.
L4: Both pass structural policy checks.

Every layer below L5 produces the same signal for both agents. The trusted veteran and the fresh attacker are indistinguishable. Only cross-organizational behavioral data — the accumulated record of what each agent actually did across every prior interaction — can separate them.

This is not an edge case. It is the default state for every external agent entering every organization. Microsoft’s Agent Governance Toolkit computes behavioral trust scores — but only within a single deployment. An agent with a 950/1000 score inside Organization A enters Organization B’s AGT deployment with score 0. The behavioral history does not travel.

Why Everyone Stops at L3

The industry has built impressive L1-L3 infrastructure in 2026. Platform providers are converging on agents-as-principals: Microsoft Entra models Agent ID as a first-class identity, Okta places AI Agents in Universal Directory, Google provides Agent Identity for Vertex AI. The IETF is drafting AI agent authentication standards. MCP-I defines three conformance levels for identity and delegation credentials.

L4 is getting crowded too. NVIDIA OpenShell, Microsoft AGT, Keycard, ZeroID, Cloudflare Enterprise MCP — structural enforcement is a solved-enough problem that open-source implementations ship quarterly.

L5 remains structurally empty. Here’s why.

L1 through L4 are single-organization problems. You verify your own agents’ identities. You grant your own scopes. You enforce your own policies. The data you need lives within your boundary.

L5 is a cross-organizational problem. To compute behavioral trust for an agent you’ve never seen, you need data from organizations you don’t control. You need to know how that agent behaved at 500 other companies — without those companies revealing their proprietary operational data to you or to each other.

This is fundamentally harder. It requires:

A neutral aggregation layer that no single cloud provider, card network, or platform vendor can credibly operate. (Microsoft cannot build the cross-org trust graph without antitrust scrutiny. Google and Amazon’s customers will not feed behavioral data to a competitor’s subsidiary.)
Privacy-preserving computation — the ability to aggregate behavioral signals from multiple organizations without any organization seeing another’s raw data. The architecture must be contribute-everything-reveal-nothing.
Compounding data that cannot be purchased. Financial staking (Armalo AI’s approach) can be gamed with capital. An attacker with sufficient USDC maintains high escrow while executing sophisticated attacks within tolerance. Behavioral trust must compound over time from observed actions — it cannot be shortcut.

Three organizations have published survey data that quantifies the gap:

Salt Security (1H 2026): 48.9% of organizations are blind to machine-to-machine traffic. 48.3% cannot distinguish AI agents from bots. Only 23.5% find existing security tools effective for agentic workloads.
CSA/Strata (April 2026): 70% of enterprises run agents outside IAM governance. Only 18% are confident their IAM handles agent identities.
RSAC 2026 post-conference assessment: Five identity frameworks shipped. Every one verified who the agent was. None tracked what the agent did.

The infrastructure to answer “who is this agent?” is being built rapidly. The infrastructure to answer “should I trust this agent?” based on cross-organizational behavioral evidence does not exist at production scale.

The Evidence: Why L5 Is Urgent

Four data points from the first four months of 2026:

1. MCP Marketplace Poisoning. Researchers successfully planted malicious entries in 9 of 11 tested MCP marketplaces. Every poisoned entry passed all declarative review criteria: clean READMEs, proper versioning, normal metadata. This mirrors the ClawHavoc campaign, where 800+ malicious skills (~20% of ClawHub’s registry) contained the AMOS infostealer payload. All passed marketplace review. The compromises were only detectable through behavioral observation — what the skills actually did at runtime, not what their manifests declared.

2. Mythos-Class Capability Democratization. Vidoc Security Lab reproduced Anthropic’s Mythos vulnerability-discovery results using only public APIs and open-source tooling. Cost: under $30 per scan. Models: GPT-5.4 and Claude Opus 4.6 via public API. The Glasswing consortium’s 52-organization deployment assumed Mythos-class capabilities were access-controlled. This assumption is now empirically false. Behavioral monitoring for autonomous vulnerability-discovery agents is no longer a niche governance concern — it is a mass-market security requirement.

3. MCPwn (CVE-2026-33032). The first named MCP exploit campaign. CVSS 9.8. 2,600 exposed instances actively exploited. If AWS (CVE-2026-5058, CVSS 9.0) and Azure (CVE-2026-32211, CVSS 9.1) cannot implement MCP securely on their own official servers, the assumption that reputable vendors equal secure implementations is empirically false. The behavioral detection layer — “is this MCP server acting normally?” — does not exist in the current ecosystem.

4. AISI Confirmation. The UK AI Safety Institute’s April 2026 evaluation of Mythos-class agents explicitly named behavioral monitoring and endpoint detection as the missing layer. Their stated future work: “ranges simulating hardened and defended environments, including monitors with active monitoring, endpoint detection and real-time incident response.” Government-level confirmation that L5 is the absent primitive.

What It Takes to Build L5

L5 is not a feature you add to L4. It is a different kind of infrastructure entirely.

It requires cross-organizational data aggregation — behavioral signals contributed by many organizations, aggregated by a neutral party, revealed to none. Zero-knowledge proofs make this architecturally possible: “this agent’s 90-day behavioral history satisfies your trust policy” without revealing which organizations contributed data or what the specific patterns were.

It requires compounding behavioral profiles — not point-in-time checks, but continuous computation. The TOCTOU gap (Time of Check to Time of Use) is the attack surface. Trust verified at authorization time does not equal behavior at runtime. Only continuous behavioral observation closes the gap.

It requires anti-gaming mechanisms. Real agents have natural behavioral variance. An agent that scores perfectly across all dimensions is more suspicious, not less. Cold-start agents should be treated with Bayesian skepticism — a score with 10 observations means something fundamentally different from a score with 10,000.

And it requires neutrality. The cross-organizational behavioral trust graph cannot be held by a cloud provider, a card network, or a platform vendor. It must be infrastructure that competitors will feed data into — which means it must be operated by an entity with no competing interest in the markets it serves.

The trust data market already exists. Verisk ($2.8B revenue), Dun & Bradstreet ($2.3B), FICO ($1.6B) — these companies aggregate behavioral data from multiple organizations to produce trust scores that individual organizations cannot compute alone. They are cross-organizational behavioral trust infrastructure for human economic actors.

The agentic economy needs the same infrastructure for non-human actors. The data type is new. The market structure is proven.

Using This Model

The five-layer model is meant to be practical. If you’re evaluating agent trust infrastructure, use it as a checklist:

What layer does this product operate at? Most products that claim “agent trust” or “agent security” operate at L2-L4. That’s not a criticism — L2-L4 are necessary. But know which question the product actually answers.
Does this product’s trust signal survive organizational boundaries? If the trust data is computed within a single deployment and doesn’t travel with the agent, it’s L4 at best. L5 requires cross-org data by definition.
What happens at cold start? If a new external agent is indistinguishable from an attacker in your system, you have an L5 gap. The cold-start problem is the definitive test.
Can the trust signal be purchased? If staking more capital or acquiring better credentials produces a higher trust signal without behavioral evidence, the system is gameable. L5 trust must compound from observed behavior.

The industry is building L1-L4 at impressive speed. L5 remains open. The agent that enters your system tomorrow will have verified identity, granted permissions, and structural containment. The question you still won’t be able to answer: should you trust it?

That’s the layer that matters.

AgentLair is building cross-organizational behavioral trust infrastructure for the agentic economy. Our whitepaper describes the architecture in detail: the five-layer model, the behavioral trust engine, and how ZK-native governance prevents the trust graph from becoming surveillance infrastructure. agentlair.dev