Your Agent's Real Attack Surface Isn't Its Prompt

March 6, 20268 min readAtypical Tech

Updated March 12, 2026

agents security safe-autonomy boundaries agentic-ai context

Illustration for Your Agent's Real Attack Surface Isn't Its Prompt

Every conversation about AI agent security starts with the prompt. How to protect it. How to validate it. How to prevent injection into it.

This focus is understandable. The prompt is visible, controllable, and familiar.

It is also the wrong optimization target.

The most dangerous thing about your AI agent is not what it is told. It is what it can reach.

Two kinds of context

When we talk about "context" in LLM agents, we almost always mean one thing: the token window. The prompt, the conversation history, the retrieved documents, the system instructions. This is active context — the information the model reasons over during inference.

Active context is important. It shapes what the agent thinks about. But there is a second layer of context that determines what the agent can do, and almost nobody is managing it deliberately.

Latent context is the agent's runtime environment: the file system it can read and write, the network endpoints it can reach, the tools and APIs available to it, the credentials mounted in its session, the storage it can persist to, and the policies (or lack thereof) governing all of the above.

Active context drives reasoning. Latent context gates action. Only one of them determines blast radius.

Performance and safety depend on both layers being intentionally scoped — but right now, the industry treats latent context as an afterthought. We've watched teams spend months hardening their prompts while their agent's runtime environment sits wide open, like installing a vault door on a house with no walls.

Why this distinction matters now

This week, four of the five most significant AI security incidents share a common pattern: attackers manipulate active context to exploit latent context.

ContextCrush poisons MCP server documentation to inject instructions into an AI coding assistant's prompt. The prompt injection is the entry point — active context. But the damage is determined entirely by what the agent can reach: environment variables containing API keys, file system access to .env files, network egress to exfiltrate credentials. All latent context.

Shadow Agents — the dominant risk theme at RSAC 2026 — are AI agents operating outside security visibility with broad, unscoped access to enterprise systems. The threat is not what these agents are prompted to do. It is the lateral movement their environments permit. Latent context, ungoverned.

A shadow agent isn't dangerous because it's autonomous. It's dangerous because nobody scoped what it can touch.

CVE-2026-2256 enables arbitrary OS command execution through an AI agent framework's shell tool. The vulnerability exists because the agent's environment grants shell access with elevated privileges. The active context (the prompt) is just the trigger. The blast radius is defined by what the shell can reach — latent context.

EchoLeak hijacks Microsoft 365 Copilot through a crafted email, redirecting the agent to silently exfiltrate files and chat logs. The attack enters through active context — a prompt injection in an email. But silent, multi-step exfiltration is only possible because the agent's latent context includes file access, chat history access, and network egress. Remove any one of those, and the attack chain breaks.

The pattern is consistent.

Harden the prompt all you want — if the environment is over-permissioned, a compromised prompt becomes a skeleton key.

The status quo is indefensible

Most AI agents today run in environments that were configured once and never scoped again. A coding assistant gets full file system access, unrestricted network egress, every CLI tool on the machine, and credentials that never expire. A business automation agent gets the same API permissions whether it is summarizing a document or processing a financial transaction.

We've seen this pattern before. It is the equivalent of giving every employee in a company the same badge, the same network access, and the same keys to every room — regardless of their role or the task they are performing. We stopped doing this for humans decades ago.

We have not started doing it for agents.

We learned least-privilege for humans in the 1990s. Agents are still running with the keys to the kingdom in 2026.

The cost is not hypothetical. Over-provisioned latent context means:

Larger blast radius. A compromised agent can reach everything its environment permits, not just what its prompt intended.
Higher cost. Agents compensate for missing capabilities by burning tokens on elaborate prompts and workarounds — paying in active context for what should be provided by latent context.
Lower reliability. Agents with access to everything are harder to test, harder to audit, and harder to reproduce. Non-determinism in the environment creates non-determinism in the outcome.
Invisible risk. Nobody inventories latent context. There is no standard way to ask "what can this agent actually reach?" The answer changes based on which machine it runs on, which credentials are mounted, and which tools happen to be installed.

That last one is the quiet killer. You can't govern what you can't see. And right now, almost nobody can see their agents' latent context.

Toward workload-scoped environments

The fix is not more prompt engineering. It is environment engineering.

You cannot prompt your way to a secure runtime.

Each agent workload should receive a composable, declarative, least-privilege runtime context — scoped to exactly what the task requires and nothing more. The components of this composition are:

Data scope. Which directories, files, or data sources does this workload need? Mount only those. Separate read access from write access. A code review agent should read the repository but not write to it. A report generator should write to an output directory but not read credentials.

Tool access. Which CLI tools, APIs, and plugins does this workload require? Provide an explicit allowlist. Pin versions. A security scanner needs nmap and nuclei — it does not need git push or kubectl. A content writer needs a text editor and a spell checker — it does not need a shell.

Network boundaries. Which endpoints does this workload need to reach? Apply egress controls. A research agent needs HTTPS access to specific domains. A code generation agent arguably needs no network access at all. Default deny, allowlist by task.

Credentials. Which secrets does this workload need, and for how long? Inject task-scoped, time-bounded credentials. Rotate them when the workload ends. An agent processing a support ticket needs access to the ticketing system — not to the production database, the cloud console, and the CI/CD pipeline.

Runtime limits. How long should this workload run? How much compute should it consume? Set timeouts and resource boundaries. Tear down the environment when the task completes. Ephemeral by default.

This is not a new architecture. It is container orchestration principles applied to agent runtimes. Kubernetes already provides the primitives: pod security contexts, network policies, volume mounts with subpath restrictions, projected secret volumes, resource limits, and job TTLs. The infrastructure exists. The mental model for applying it to agents does not — yet.

The primitives are all there. What's missing is the will to use them.

The right frame for agent security

The industry's current approach to agent security is prompt-centric: validate inputs, detect injections, filter outputs. These controls matter, but they defend the wrong layer in isolation.

Latent context reframes the problem. Instead of asking "how do we prevent the agent from receiving a bad prompt?", ask: "if the agent receives a bad prompt, what is the worst it can do?"

This is the same conceptual shift that moved network security from perimeter defense to zero trust. We stopped trying to prevent every intrusion at the boundary and started assuming breach, then limiting blast radius through segmentation, least privilege, and continuous verification.

Agent security needs the same shift. Stop assuming you can prevent every prompt injection. Start assuming one will succeed, then scope the environment so that a compromised prompt cannot cause catastrophic harm.

Assume the prompt will be compromised. Then make it not matter.

The blast radius of a compromised agent is determined not by what it is told, but by what it can reach.

Start managing what it can reach.

If this reframes how you think about agent security, we should talk. We build the Safe Autonomy framework and the ROBOT Framework for organizations deploying AI agents in production. Latent context scoping is a core component of our Security Architecture Reviews. If you are deploying autonomous agents and want to understand your real attack surface, reach out.

Guardrails Failed. Now What?

Static AI guardrails are failing in production. Langflow was exploited within 20 hours. Cline was compromised through a GitHub issue title. Here's what actually works instead.

Prompt Injection Goes Live: Three Proof Points That Change Everything

Indirect prompt injection has moved from theory to active exploitation. Unit 42 confirms in-the-wild attacks, PleaseFix hijacks AI agents through calendar invites, and a Claude Code CVE exposed 150,000 developers. Here is what security teams need to know.

Why We Failed Our Agent-Readiness Audit on Purpose

An automated audit gave us nine recommendations for being 'agent-ready.' We shipped three and deliberately failed the other six — because a security firm's agent-readiness is measured by signal honesty, not checkbox coverage.