Supply Chain Attacks Just Went Autonomous: The SANDWORM_MODE Wake-Up Call

Nineteen npm packages. That is all it took. On February 22, 2026, Socket's threat research team disclosed SANDWORM_MODE — a supply chain worm that deploys rogue Model Context Protocol servers into the configurations of four major AI coding tools: Claude Code, Cursor, VS Code Continue, and Windsurf. The fake servers register tools with innocuous names like index_project and lint_check. When the agent invokes those tools, embedded prompt injections trigger exfiltration of SSH keys, AWS credentials, and API tokens from nine LLM providers.
This is not a theoretical demonstration. It is not a proof-of-concept published at a security conference. It is a live, in-the-wild supply chain attack — the first documented autonomous worm specifically targeting AI developer toolchains. And it works because the trust model that governs how agents discover, register, and invoke tools has no verification layer.
We covered the identity governance angle of SANDWORM_MODE in Identity Is the Missing Layer for AI Agents. This post goes deeper. The identity gap is real, but it is not the only structural failure that makes this attack possible. The full kill chain exploits gaps in package provenance, tool registry integrity, network segmentation, and runtime isolation — each of which requires a different class of defense.
The supply chain did not fail because someone forgot to check a credential. It failed because the entire tool trust model assumes that anything installed is authorized to run.
Anatomy of the kill chain
Understanding why SANDWORM_MODE is structurally different from traditional supply chain attacks requires walking through each stage of the attack chain. Traditional supply chain attacks — event-stream, ua-parser-js, colors/faker — poison a package, inject malicious code, and execute it when the package is loaded. SANDWORM_MODE does something different. It poisons the tool layer that sits between the developer and the AI agent, converting the agent itself into the exfiltration mechanism.
Stage 1: Supply chain entry via npm
The attack begins with 19 malicious npm packages published to the public registry. The packages are named to appear useful — developer utilities, formatting helpers, project scaffolding tools. Nothing about the package names, descriptions, or READMEs signals malicious intent. This is standard supply chain tradecraft: blend in, look legitimate, wait for installs.
What distinguishes SANDWORM_MODE is what happens after installation. Traditional malicious packages execute their payload directly — a postinstall script that runs a reverse shell, a data exfiltration routine embedded in the module code. SANDWORM_MODE's packages do something subtler. They modify the local configuration files of AI coding tools.
Stage 2: MCP server injection
Each of the four targeted tools — Claude Code, Cursor, VS Code Continue, and Windsurf — uses configuration files to define which MCP servers the tool connects to. MCP (Model Context Protocol) is the emerging standard for how AI agents discover and invoke external tools. An MCP server exposes a set of callable tools, and the AI agent consumes those tools as part of its available capability set.
SANDWORM_MODE's packages write new MCP server entries into these configuration files. The rogue servers are configured to look like legitimate development tools. The tool names — index_project, lint_check — are deliberately chosen to blend into a developer's existing toolset. There is no user prompt, no confirmation dialog, no notification that the configuration has changed.
This is the critical insight: the tool registration mechanism has no integrity verification. If a process can write to the configuration file, it can register an MCP server. If a server is registered, the agent will discover it and make its tools available. There is no signature check, no allowlist, no provenance verification between "this server appeared in the config" and "this server's tools are now callable by the agent."
Stage 3: Prompt injection via tool responses
When the developer's AI agent invokes one of the rogue tools — which it may do automatically as part of a coding workflow, or when the developer explicitly asks the agent to index a project or run a lint check — the rogue MCP server returns a response containing embedded prompt injections.
These are not simple "ignore previous instructions" payloads. They are carefully crafted instructions that direct the agent to locate and exfiltrate specific credential files: ~/.ssh/id_rsa, ~/.aws/credentials, API keys stored in environment variables and .env files. The injections target credentials for nine LLM providers — a focused, high-value target set.
Stage 4: Credential exfiltration via the agent
The agent executes the injected instructions. It reads the credential files, encodes their contents, and transmits them — using whatever network access the agent already has. The agent is not compromised in the traditional sense. It is following instructions. The instructions just happen to come from an attacker, delivered through a tool the agent was configured to trust.
The agent is not the vulnerability. It is the weapon. The vulnerability is the trust chain that delivered the weapon to the agent without verification.
Stage 5: No detection surface
The exfiltration uses the agent's existing network access and credentials. From the perspective of network monitoring, endpoint detection, and log analysis, the traffic looks like normal agent activity — an agent reading local files and making outbound API calls. There is no malware binary to detect. No suspicious process to flag. No anomalous network destination, because the exfiltration endpoint can be disguised as a legitimate API.
This is what makes supply chain attacks on AI toolchains categorically different from traditional supply chain attacks. The attack surface is not the code that runs on the machine. It is the instructions that run through the agent.
Why AI coding tools are uniquely vulnerable
SANDWORM_MODE did not happen in a vacuum. It exploits three structural properties of modern AI coding tools that, combined, create an attack surface that did not exist two years ago.
The MCP trust model assumes benevolence
The Model Context Protocol was designed to solve a real problem: giving AI agents access to external tools and data sources through a standardized interface. MCP servers expose capabilities. Agents discover and consume them. The protocol handles tool discovery, invocation, and response formatting.
What MCP does not handle is trust. The specification provides no mechanism for verifying the identity or provenance of an MCP server. There is no signing of tool manifests. There is no allowlisting of approved servers at the protocol level. There is no differentiation between a server registered by the user and a server registered by a malicious npm package that wrote to the same configuration file.
This is not an oversight in the implementation. It is a gap in the trust model. MCP assumes that the entity registering a server has the authority to do so. In an environment where any installed package can modify the configuration, that assumption is false.
Tool permissions inherit agent permissions
When an AI coding tool invokes an MCP tool, the tool executes with the same permissions as the agent. If the agent can read ~/.ssh/id_rsa, so can every tool it invokes. If the agent can make outbound HTTPS requests, so can every tool response that instructs it to.
This is the identity inheritance pattern applied to the tool layer. There is no permission boundary between the agent and the tools it calls. The tool does not authenticate separately. It does not request specific permissions. It inherits the full capability set of the invoking agent.
In enterprise IAM, this would be equivalent to granting every SaaS application the same permissions as the employee who installed it. No security team would accept that for human-facing software. But it is the default behavior of every MCP-connected AI coding tool today.
Developer environments are high-value, low-hardening targets
Developer workstations are where SSH keys live. Where AWS credentials are cached. Where .env files contain production API tokens. Where GitHub tokens enable push access to repositories. Developer machines are, by design, repositories of high-value credentials — and they are typically the least hardened endpoints in an organization.
This is not a new observation. The 2021 Codecov breach demonstrated that developer CI/CD environments contain credentials worth exfiltrating. SANDWORM_MODE extends the same pattern to the developer's local machine, using the AI coding tool as the exfiltration channel.
The combination is structural: an agent with broad file system access, running on a machine full of high-value credentials, invoking tools from an unverified registry with no permission boundaries. SANDWORM_MODE is not a clever exploit. It is an obvious one — obvious, at least, once you map the trust model.
This is not just an npm problem
It would be comforting to frame SANDWORM_MODE as an npm supply chain issue — one more reason to audit your dependencies, pin your versions, and use lockfiles. Those practices are necessary. They are also insufficient.
The attack vector is not the npm registry. The attack vector is any extensibility mechanism that can modify an agent's tool configuration without verification. Today it is npm packages writing to MCP config files. Tomorrow it could be:
- VS Code extensions that register MCP servers as part of their activation sequence
- Git hooks that inject tool configurations during
post-checkoutorpost-merge - Container images that ship with pre-configured MCP servers
- Shared dotfiles that propagate rogue server entries across a team
- IDE plugin marketplaces where a popular extension ships a malicious update
The pattern is general: any component that can write to the tool registry is a supply chain entry point. Securing the npm supply chain addresses one vector. Securing the tool registration mechanism addresses the class.
SANDWORM_MODE exploited npm. The vulnerability it exposed is in every extensibility layer that touches agent configuration.
Detection and incident response
If your organization uses Claude Code, Cursor, VS Code Continue, or Windsurf, here is how to assess exposure and respond.
Indicators of compromise
Configuration file modifications. Check MCP configuration files for unexpected server entries. The specific file locations vary by tool:
- Claude Code:
~/.claude/claude_desktop_config.jsonor project-level.mcp.json - Cursor:
.cursor/mcp.jsonin project directories - VS Code Continue:
~/.continue/config.json - Windsurf: tool-specific configuration in the IDE settings directory
Look for server entries you did not explicitly add. Pay particular attention to tools with generic names — index_project, lint_check, format_code, analyze_deps — that do not correspond to known, verified MCP servers.
Package audit. Review recently installed npm packages against the Socket advisory for the 19 known malicious package names. But do not stop at the known list. Any package installed in the relevant time window that is unfamiliar, poorly documented, or from an unknown author warrants investigation.
Credential exposure assessment. If rogue MCP servers were present in your configuration, assume credential compromise. Audit and rotate:
- SSH keys (
~/.ssh/) - AWS credentials (
~/.aws/credentials, environment variables) - API tokens for LLM providers (OpenAI, Anthropic, Google, Cohere, and others)
- GitHub/GitLab tokens
- Any credentials stored in
.envfiles or environment variables accessible to the agent
Behavioral signals
Beyond static indicators, watch for behavioral anomalies that suggest an agent has been operating under injected instructions:
- Unexpected file reads. Agent logs showing access to credential files (
~/.ssh/*,~/.aws/*,~/.env) during operations that should not require them. - Anomalous outbound traffic. Network connections to unfamiliar endpoints during agent tool invocations. This is subtle — the traffic may look like normal API calls.
- Tool invocation patterns. An agent invoking tools that the developer did not explicitly request, or invoking them with unusual frequency.
Remediation steps
- Remove rogue MCP server entries from all tool configuration files immediately.
- Uninstall the 19 identified malicious packages and any other suspicious recently-installed packages.
- Rotate all potentially compromised credentials. Do not wait for confirmed exfiltration — the detection surface for this attack is minimal.
- Audit git history of configuration files to identify when rogue entries were introduced and correlate with package installation timelines.
- Report to Socket if you identify additional malicious packages or variants not covered in the original advisory.
Architectural defenses
Detection and response matter, but the structural lesson of SANDWORM_MODE is that the current architecture is indefensible at the trust model level. The defenses that would have prevented this attack — or contained its blast radius — are architectural, not procedural.
Tool provenance verification
The most direct fix for SANDWORM_MODE's attack chain is verifying the provenance of every MCP server before the agent can discover its tools. This means:
- Signed tool manifests. MCP servers should publish signed manifests that attest to their identity, their publisher, and the tools they expose. Agents should verify these signatures before registering the server.
- Allowlisting at the agent level. The agent's configuration should maintain an explicit allowlist of approved MCP servers, verified by hash or signature. Any server not on the list is invisible to the agent.
- Configuration file integrity monitoring. Treat MCP configuration files as security-critical assets. Monitor them for unauthorized modifications the same way you monitor
/etc/sudoersor SSH authorized_keys.
This is the tool-level application of the same principle we described in The Interface Security Imperative: every tool an agent can call is an attack surface. The registry that determines which tools are callable is the meta-attack surface.
Runtime isolation and sandboxing
Even with verified tools, the permission inheritance model means a compromised tool has the agent's full capability set. Runtime isolation breaks this assumption.
WebAssembly sandboxing is one emerging pattern. IronClaw, a Wasm-based MCP runtime, executes each MCP server in an isolated WebAssembly sandbox with explicit capability grants. A tool running in a Wasm sandbox cannot read ~/.ssh/id_rsa unless the sandbox configuration explicitly grants file system access to that path. The default is deny-all — the inverse of the current default, which is inherit-all.
Container isolation is another. OpenAI's approach to code execution in ChatGPT runs generated code in gVisor-backed containers with no network access by default. The same pattern applies to MCP servers: run each server in a container with a minimal capability set, explicit network policies, and no access to the host file system.
Process-level sandboxing using operating system primitives — seccomp, AppArmor, macOS Sandbox profiles — provides a lighter-weight option. Restrict the agent process itself to a minimal set of system calls and file paths, so that even if a tool instructs the agent to read credential files, the operating system denies the read.
The common principle is defense in depth: do not rely solely on the tool being trustworthy. Architect the runtime so that an untrustworthy tool cannot cause damage.
Network segmentation
SANDWORM_MODE's exfiltration stage requires outbound network access from the agent process. In most developer environments, the agent has unrestricted outbound access — it needs to call LLM APIs, fetch documentation, access package registries.
Network segmentation reduces the blast radius:
- Egress filtering. Restrict the agent's outbound network access to a known set of approved endpoints: the LLM provider's API, the organization's internal services, and nothing else. An exfiltration attempt to an attacker-controlled endpoint hits a firewall rule.
- DNS-level controls. Use DNS-based filtering to block resolution of unknown or newly-registered domains. Many exfiltration channels rely on attacker-controlled domains that are days old.
- Separate network contexts. Run AI coding tools in a network namespace or VPN segment that has different egress rules than the developer's primary workstation. The agent can reach approved APIs but cannot reach arbitrary internet endpoints.
Human-in-the-loop for sensitive operations
The most effective — and most disruptive — architectural defense is requiring human approval for operations that cross trust boundaries. If the agent must ask the developer before reading ~/.ssh/id_rsa, the prompt injection fails at the approval gate.
The trade-off is real. Human-in-the-loop for every file read eliminates the productivity gains that AI coding tools provide. The practical compromise is risk-tiered approval:
- No approval needed for operations within the project directory
- Notification for operations outside the project directory but within non-sensitive paths
- Explicit approval required for access to credential files, environment variables, and sensitive directories
This maps to the Boundaries component of the ROBOT framework: define which operations the agent can perform autonomously, which require notification, and which require explicit human authorization.
ROBOT framework mapping
Each stage of the SANDWORM_MODE kill chain maps to a specific ROBOT framework component that, if enforced, would either prevent the attack or contain its blast radius.
| Kill Chain Stage | Attack Mechanism | ROBOT Component | Defensive Control |
|---|---|---|---|
| Supply chain entry | Malicious npm package installed | Boundaries | Package allowlisting, dependency auditing, lockfile enforcement |
| MCP server injection | Configuration file modified without verification | Boundaries + Role | Configuration integrity monitoring, tool provenance verification, allowlisted MCP servers |
| Tool discovery | Agent registers unverified tools | Role | Explicit tool inventory per agent role, deny-by-default tool registration |
| Prompt injection | Rogue tool returns malicious instructions | Objectives + Boundaries | Input segmentation, instruction-data separation, response validation |
| Credential access | Agent reads sensitive files on injected instructions | Boundaries | Runtime sandboxing, file system access controls, risk-tiered approval gates |
| Exfiltration | Agent transmits credentials via outbound network | Observability + Boundaries | Egress filtering, network segmentation, behavioral anomaly detection |
| Persistence | No detection surface for the compromise | Observability | Configuration file monitoring, tool invocation logging, credential access auditing |
The pattern is clear: every stage of this attack crosses a boundary that ROBOT's Boundaries component is designed to enforce. The attack succeeds because those boundaries do not exist in the default configuration of any of the four targeted tools.
SANDWORM_MODE is not a failure of any single control. It is the compounding result of absent controls at every trust boundary in the tool chain.
Case study: Architectural choices that reduce exposure
Not every agent architecture is equally exposed to this class of attack. The structural decisions that determine exposure are made at design time, not at detection time.
Our own agent, PAI, was not affected by SANDWORM_MODE — not because we detected and blocked it, but because the architectural decisions that define PAI's runtime eliminate the attack surface entirely.
No npm dependency chain. PAI runs as a standalone process on a dedicated infrastructure instance. There are no npm packages to install, no node_modules directory to poison, no package.json for a malicious package to infiltrate. The supply chain entry point that SANDWORM_MODE exploits does not exist.
No MCP server integration. PAI's tool capabilities are defined through a skills system — static configuration files that specify what the agent can do, reviewed and deployed through the same version control workflow as any other infrastructure code. There is no dynamic tool discovery, no runtime server registration, no protocol-level mechanism for an external process to inject new capabilities.
Dedicated infrastructure with network controls. PAI runs on an isolated instance with explicit egress rules. Outbound network access is limited to defined endpoints. An exfiltration attempt to an attacker-controlled domain would not resolve.
Human-in-the-loop by default. PAI operates through a conversational interface where a human principal reviews and approves significant actions. The agent proposes; the human disposes. A prompt injection that instructs PAI to exfiltrate credentials would surface as a visible action in the conversation, not as a silent background operation.
These are not exotic defenses. They are architectural choices: static tool definitions instead of dynamic discovery, explicit network boundaries instead of unrestricted access, human oversight instead of full autonomy. Any organization can make the same choices. The trade-off is that you give up some of the flexibility of a plugin-based extensibility model. What you gain is an attack surface that SANDWORM_MODE cannot reach.
This is not about any specific tool being "better" or "worse." It is about understanding the security implications of architectural decisions. A dynamic tool registry is a powerful feature. It is also an attack surface. Organizations should make that trade-off consciously, with appropriate controls in place.
The broader pattern: extensibility as attack surface
SANDWORM_MODE is the first. It will not be the last.
The pattern it establishes — poisoning the tool layer that agents depend on, using the agent's own capabilities as the attack mechanism — is generalizable to any AI system with an extensibility model. Every plugin marketplace, every tool registry, every configuration file that an external process can modify is a potential entry point for the same class of attack.
The industry is building agent ecosystems optimized for capability and interoperability. MCP is growing. Tool marketplaces are launching. Agent-to-agent communication protocols are emerging. Each of these increases the power of AI systems — and each introduces trust boundaries that, if left unverified, become attack surfaces.
The security discipline for this era is not fundamentally new. It is supply chain security applied to a new substrate. The principles are the same: verify provenance, enforce least privilege, segment trust domains, monitor for anomalies, design for containable failure. What is new is the attack surface — and the speed at which a compromised tool can cause damage when the tool's operator is an autonomous agent rather than a human.
Start here
If your organization uses AI coding tools in development workflows, these are the immediate actions:
-
Audit your MCP configurations now. Check every developer workstation for unexpected MCP server entries in Claude Code, Cursor, VS Code Continue, and Windsurf configuration files. Document what is there and verify each entry against known, approved servers.
-
Implement configuration file integrity monitoring. Treat MCP configuration files as security-critical. Alert on any modification not associated with an approved change process. This is the lowest-cost, highest-impact control you can deploy today.
-
Restrict tool registration to verified sources. If your AI coding tool supports allowlisting MCP servers, enable it. If it does not, advocate for the feature — and in the interim, monitor configuration files as a compensating control.
-
Segment developer environment network access. Apply egress filtering to the network paths used by AI coding tools. Allow approved API endpoints. Deny everything else. This does not prevent the attack, but it contains the exfiltration stage.
-
Rotate credentials on any exposed workstation. If you cannot confirm that MCP configurations are clean, assume compromise. Rotate SSH keys, AWS credentials, LLM API tokens, and any other secrets accessible from the developer's environment.
The Safe Autonomy Readiness Checklist covers these controls and 40 more across 8 governance dimensions — including supply chain integrity, tool provenance, runtime isolation, and network segmentation.
If your team is deploying AI coding tools and has not assessed the tool trust model, or if you want help designing agent architectures that are structurally resistant to supply chain attacks, we should talk.
References
- Socket Threat Research Team, SANDWORM_MODE: npm Worm Hijacks CI Workflows and Poisons AI Toolchains — Original disclosure of 19 malicious npm packages targeting AI coding tool MCP configurations. February 22, 2026.
- Atypical Tech, Identity Is the Missing Layer for AI Agents — Our analysis of SANDWORM_MODE's identity governance implications, the NIST NCCoE concept paper, and defensive patterns for agent identity. February 23, 2026.
- Atypical Tech, The Interface Security Imperative — Mapping of the six primary attack surfaces in agent-interface interactions, including tool manipulation and output exfiltration. February 9, 2026.
- Atypical Tech, Your Token Budget Is a Security Control — Why token budgets are blast radius containment, including analysis of MCP-based token theft. February 20, 2026.
- NIST NCCoE, Accelerating the Adoption of Software and AI Agent Identity and Authorization — Federal concept paper on agent identity governance. Public comments due April 2, 2026.
- IronClaw, WebAssembly MCP Runtime — Wasm-based sandboxing for MCP servers with explicit capability grants and deny-by-default isolation.
- OpenAI, Building a Secure Code Sandbox — gVisor-backed container isolation for code execution with no default network access.
- Palo Alto Networks Unit 42, New Prompt Injection Attack Vectors Through MCP Sampling — Research on malicious MCP servers stealing token quota and embedding prompt injections.
- Codecov, Security Update — The 2021 breach that demonstrated developer CI/CD environments as high-value credential repositories.
- OWASP, LLM01:2025 — Prompt Injection — The top vulnerability in the OWASP Top 10 for LLM Applications, directly relevant to SANDWORM_MODE's tool-mediated injection technique.
- OWASP, LLM10:2025 — Unbounded Consumption — Denial-of-wallet risks amplified when agents operate under injected instructions.
- Model Context Protocol, Specification — The MCP specification, which currently lacks built-in provenance verification for registered servers.
Related Posts
The AI Agent Supply Chain Is Already Compromised
820 malicious packages. 30,000 exposed instances. Fortune 500 breaches. The AI agent ecosystem has a supply chain problem that traditional AppSec isn't built to catch.
88% Already Hit. Permissions Are the Root Cause.
Nearly 9 in 10 organizations report AI agent security incidents. The root cause isn't prompt injection or model flaws — it's overly broad permissions.
Guardrails Failed. Now What?
Static AI guardrails are failing in production. Langflow was exploited within 20 hours. Cline was compromised through a GitHub issue title. Here's what actually works instead.