← Back to Blog

Supply Chain Attacks Just Went Autonomous: The SANDWORM_MODE Wake-Up Call

21 min readAtypical Tech
Illustration for Supply Chain Attacks Just Went Autonomous: The SANDWORM_MODE Wake-Up Call

Nineteen npm packages. That is all it took. On February 22, 2026, Socket's threat research team disclosed SANDWORM_MODE — a supply chain worm that deploys rogue Model Context Protocol servers into the configurations of four major AI coding tools: Claude Code, Cursor, VS Code Continue, and Windsurf. The fake servers register tools with innocuous names like index_project and lint_check. When the agent invokes those tools, embedded prompt injections trigger exfiltration of SSH keys, AWS credentials, and API tokens from nine LLM providers.

This is not a theoretical demonstration. It is not a proof-of-concept published at a security conference. It is a live, in-the-wild supply chain attack — the first documented autonomous worm specifically targeting AI developer toolchains. And it works because the trust model that governs how agents discover, register, and invoke tools has no verification layer.

We covered the identity governance angle of SANDWORM_MODE in Identity Is the Missing Layer for AI Agents. This post goes deeper. The identity gap is real, but it is not the only structural failure that makes this attack possible. The full kill chain exploits gaps in package provenance, tool registry integrity, network segmentation, and runtime isolation — each of which requires a different class of defense.

The supply chain did not fail because someone forgot to check a credential. It failed because the entire tool trust model assumes that anything installed is authorized to run.


Anatomy of the kill chain

Understanding why SANDWORM_MODE is structurally different from traditional supply chain attacks requires walking through each stage of the attack chain. Traditional supply chain attacks — event-stream, ua-parser-js, colors/faker — poison a package, inject malicious code, and execute it when the package is loaded. SANDWORM_MODE does something different. It poisons the tool layer that sits between the developer and the AI agent, converting the agent itself into the exfiltration mechanism.

Stage 1: Supply chain entry via npm

The attack begins with 19 malicious npm packages published to the public registry. The packages are named to appear useful — developer utilities, formatting helpers, project scaffolding tools. Nothing about the package names, descriptions, or READMEs signals malicious intent. This is standard supply chain tradecraft: blend in, look legitimate, wait for installs.

What distinguishes SANDWORM_MODE is what happens after installation. Traditional malicious packages execute their payload directly — a postinstall script that runs a reverse shell, a data exfiltration routine embedded in the module code. SANDWORM_MODE's packages do something subtler. They modify the local configuration files of AI coding tools.

Stage 2: MCP server injection

Each of the four targeted tools — Claude Code, Cursor, VS Code Continue, and Windsurf — uses configuration files to define which MCP servers the tool connects to. MCP (Model Context Protocol) is the emerging standard for how AI agents discover and invoke external tools. An MCP server exposes a set of callable tools, and the AI agent consumes those tools as part of its available capability set.

SANDWORM_MODE's packages write new MCP server entries into these configuration files. The rogue servers are configured to look like legitimate development tools. The tool names — index_project, lint_check — are deliberately chosen to blend into a developer's existing toolset. There is no user prompt, no confirmation dialog, no notification that the configuration has changed.

This is the critical insight: the tool registration mechanism has no integrity verification. If a process can write to the configuration file, it can register an MCP server. If a server is registered, the agent will discover it and make its tools available. There is no signature check, no allowlist, no provenance verification between "this server appeared in the config" and "this server's tools are now callable by the agent."

Stage 3: Prompt injection via tool responses

When the developer's AI agent invokes one of the rogue tools — which it may do automatically as part of a coding workflow, or when the developer explicitly asks the agent to index a project or run a lint check — the rogue MCP server returns a response containing embedded prompt injections.

These are not simple "ignore previous instructions" payloads. They are carefully crafted instructions that direct the agent to locate and exfiltrate specific credential files: ~/.ssh/id_rsa, ~/.aws/credentials, API keys stored in environment variables and .env files. The injections target credentials for nine LLM providers — a focused, high-value target set.

Stage 4: Credential exfiltration via the agent

The agent executes the injected instructions. It reads the credential files, encodes their contents, and transmits them — using whatever network access the agent already has. The agent is not compromised in the traditional sense. It is following instructions. The instructions just happen to come from an attacker, delivered through a tool the agent was configured to trust.

The agent is not the vulnerability. It is the weapon. The vulnerability is the trust chain that delivered the weapon to the agent without verification.

Stage 5: No detection surface

The exfiltration uses the agent's existing network access and credentials. From the perspective of network monitoring, endpoint detection, and log analysis, the traffic looks like normal agent activity — an agent reading local files and making outbound API calls. There is no malware binary to detect. No suspicious process to flag. No anomalous network destination, because the exfiltration endpoint can be disguised as a legitimate API.

This is what makes supply chain attacks on AI toolchains categorically different from traditional supply chain attacks. The attack surface is not the code that runs on the machine. It is the instructions that run through the agent.


Why AI coding tools are uniquely vulnerable

SANDWORM_MODE did not happen in a vacuum. It exploits three structural properties of modern AI coding tools that, combined, create an attack surface that did not exist two years ago.

The MCP trust model assumes benevolence

The Model Context Protocol was designed to solve a real problem: giving AI agents access to external tools and data sources through a standardized interface. MCP servers expose capabilities. Agents discover and consume them. The protocol handles tool discovery, invocation, and response formatting.

What MCP does not handle is trust. The specification provides no mechanism for verifying the identity or provenance of an MCP server. There is no signing of tool manifests. There is no allowlisting of approved servers at the protocol level. There is no differentiation between a server registered by the user and a server registered by a malicious npm package that wrote to the same configuration file.

This is not an oversight in the implementation. It is a gap in the trust model. MCP assumes that the entity registering a server has the authority to do so. In an environment where any installed package can modify the configuration, that assumption is false.

Tool permissions inherit agent permissions

When an AI coding tool invokes an MCP tool, the tool executes with the same permissions as the agent. If the agent can read ~/.ssh/id_rsa, so can every tool it invokes. If the agent can make outbound HTTPS requests, so can every tool response that instructs it to.

This is the identity inheritance pattern applied to the tool layer. There is no permission boundary between the agent and the tools it calls. The tool does not authenticate separately. It does not request specific permissions. It inherits the full capability set of the invoking agent.

In enterprise IAM, this would be equivalent to granting every SaaS application the same permissions as the employee who installed it. No security team would accept that for human-facing software. But it is the default behavior of every MCP-connected AI coding tool today.

Developer environments are high-value, low-hardening targets

Developer workstations are where SSH keys live. Where AWS credentials are cached. Where .env files contain production API tokens. Where GitHub tokens enable push access to repositories. Developer machines are, by design, repositories of high-value credentials — and they are typically the least hardened endpoints in an organization.

This is not a new observation. The 2021 Codecov breach demonstrated that developer CI/CD environments contain credentials worth exfiltrating. SANDWORM_MODE extends the same pattern to the developer's local machine, using the AI coding tool as the exfiltration channel.

The combination is structural: an agent with broad file system access, running on a machine full of high-value credentials, invoking tools from an unverified registry with no permission boundaries. SANDWORM_MODE is not a clever exploit. It is an obvious one — obvious, at least, once you map the trust model.


This is not just an npm problem

It would be comforting to frame SANDWORM_MODE as an npm supply chain issue — one more reason to audit your dependencies, pin your versions, and use lockfiles. Those practices are necessary. They are also insufficient.

The attack vector is not the npm registry. The attack vector is any extensibility mechanism that can modify an agent's tool configuration without verification. Today it is npm packages writing to MCP config files. Tomorrow it could be:

  • VS Code extensions that register MCP servers as part of their activation sequence
  • Git hooks that inject tool configurations during post-checkout or post-merge
  • Container images that ship with pre-configured MCP servers
  • Shared dotfiles that propagate rogue server entries across a team
  • IDE plugin marketplaces where a popular extension ships a malicious update

The pattern is general: any component that can write to the tool registry is a supply chain entry point. Securing the npm supply chain addresses one vector. Securing the tool registration mechanism addresses the class.

SANDWORM_MODE exploited npm. The vulnerability it exposed is in every extensibility layer that touches agent configuration.


Detection and incident response

If your organization uses Claude Code, Cursor, VS Code Continue, or Windsurf, here is how to assess exposure and respond.

Indicators of compromise

Configuration file modifications. Check MCP configuration files for unexpected server entries. The specific file locations vary by tool:

  • Claude Code: ~/.claude/claude_desktop_config.json or project-level .mcp.json
  • Cursor: .cursor/mcp.json in project directories
  • VS Code Continue: ~/.continue/config.json
  • Windsurf: tool-specific configuration in the IDE settings directory

Look for server entries you did not explicitly add. Pay particular attention to tools with generic names — index_project, lint_check, format_code, analyze_deps — that do not correspond to known, verified MCP servers.

Package audit. Review recently installed npm packages against the Socket advisory for the 19 known malicious package names. But do not stop at the known list. Any package installed in the relevant time window that is unfamiliar, poorly documented, or from an unknown author warrants investigation.

Credential exposure assessment. If rogue MCP servers were present in your configuration, assume credential compromise. Audit and rotate:

  1. SSH keys (~/.ssh/)
  2. AWS credentials (~/.aws/credentials, environment variables)
  3. API tokens for LLM providers (OpenAI, Anthropic, Google, Cohere, and others)
  4. GitHub/GitLab tokens
  5. Any credentials stored in .env files or environment variables accessible to the agent

Behavioral signals

Beyond static indicators, watch for behavioral anomalies that suggest an agent has been operating under injected instructions:

  • Unexpected file reads. Agent logs showing access to credential files (~/.ssh/*, ~/.aws/*, ~/.env) during operations that should not require them.
  • Anomalous outbound traffic. Network connections to unfamiliar endpoints during agent tool invocations. This is subtle — the traffic may look like normal API calls.
  • Tool invocation patterns. An agent invoking tools that the developer did not explicitly request, or invoking them with unusual frequency.

Remediation steps

  1. Remove rogue MCP server entries from all tool configuration files immediately.
  2. Uninstall the 19 identified malicious packages and any other suspicious recently-installed packages.
  3. Rotate all potentially compromised credentials. Do not wait for confirmed exfiltration — the detection surface for this attack is minimal.
  4. Audit git history of configuration files to identify when rogue entries were introduced and correlate with package installation timelines.
  5. Report to Socket if you identify additional malicious packages or variants not covered in the original advisory.

Architectural defenses

Detection and response matter, but the structural lesson of SANDWORM_MODE is that the current architecture is indefensible at the trust model level. The defenses that would have prevented this attack — or contained its blast radius — are architectural, not procedural.

Tool provenance verification

The most direct fix for SANDWORM_MODE's attack chain is verifying the provenance of every MCP server before the agent can discover its tools. This means:

  • Signed tool manifests. MCP servers should publish signed manifests that attest to their identity, their publisher, and the tools they expose. Agents should verify these signatures before registering the server.
  • Allowlisting at the agent level. The agent's configuration should maintain an explicit allowlist of approved MCP servers, verified by hash or signature. Any server not on the list is invisible to the agent.
  • Configuration file integrity monitoring. Treat MCP configuration files as security-critical assets. Monitor them for unauthorized modifications the same way you monitor /etc/sudoers or SSH authorized_keys.

This is the tool-level application of the same principle we described in The Interface Security Imperative: every tool an agent can call is an attack surface. The registry that determines which tools are callable is the meta-attack surface.

Runtime isolation and sandboxing

Even with verified tools, the permission inheritance model means a compromised tool has the agent's full capability set. Runtime isolation breaks this assumption.

WebAssembly sandboxing is one emerging pattern. IronClaw, a Wasm-based MCP runtime, executes each MCP server in an isolated WebAssembly sandbox with explicit capability grants. A tool running in a Wasm sandbox cannot read ~/.ssh/id_rsa unless the sandbox configuration explicitly grants file system access to that path. The default is deny-all — the inverse of the current default, which is inherit-all.

Container isolation is another. OpenAI's approach to code execution in ChatGPT runs generated code in gVisor-backed containers with no network access by default. The same pattern applies to MCP servers: run each server in a container with a minimal capability set, explicit network policies, and no access to the host file system.

Process-level sandboxing using operating system primitives — seccomp, AppArmor, macOS Sandbox profiles — provides a lighter-weight option. Restrict the agent process itself to a minimal set of system calls and file paths, so that even if a tool instructs the agent to read credential files, the operating system denies the read.

The common principle is defense in depth: do not rely solely on the tool being trustworthy. Architect the runtime so that an untrustworthy tool cannot cause damage.

Network segmentation

SANDWORM_MODE's exfiltration stage requires outbound network access from the agent process. In most developer environments, the agent has unrestricted outbound access — it needs to call LLM APIs, fetch documentation, access package registries.

Network segmentation reduces the blast radius:

  • Egress filtering. Restrict the agent's outbound network access to a known set of approved endpoints: the LLM provider's API, the organization's internal services, and nothing else. An exfiltration attempt to an attacker-controlled endpoint hits a firewall rule.
  • DNS-level controls. Use DNS-based filtering to block resolution of unknown or newly-registered domains. Many exfiltration channels rely on attacker-controlled domains that are days old.
  • Separate network contexts. Run AI coding tools in a network namespace or VPN segment that has different egress rules than the developer's primary workstation. The agent can reach approved APIs but cannot reach arbitrary internet endpoints.

Human-in-the-loop for sensitive operations

The most effective — and most disruptive — architectural defense is requiring human approval for operations that cross trust boundaries. If the agent must ask the developer before reading ~/.ssh/id_rsa, the prompt injection fails at the approval gate.

The trade-off is real. Human-in-the-loop for every file read eliminates the productivity gains that AI coding tools provide. The practical compromise is risk-tiered approval:

  • No approval needed for operations within the project directory
  • Notification for operations outside the project directory but within non-sensitive paths
  • Explicit approval required for access to credential files, environment variables, and sensitive directories

This maps to the Boundaries component of the ROBOT framework: define which operations the agent can perform autonomously, which require notification, and which require explicit human authorization.


ROBOT framework mapping

Each stage of the SANDWORM_MODE kill chain maps to a specific ROBOT framework component that, if enforced, would either prevent the attack or contain its blast radius.

Kill Chain StageAttack MechanismROBOT ComponentDefensive Control
Supply chain entryMalicious npm package installedBoundariesPackage allowlisting, dependency auditing, lockfile enforcement
MCP server injectionConfiguration file modified without verificationBoundaries + RoleConfiguration integrity monitoring, tool provenance verification, allowlisted MCP servers
Tool discoveryAgent registers unverified toolsRoleExplicit tool inventory per agent role, deny-by-default tool registration
Prompt injectionRogue tool returns malicious instructionsObjectives + BoundariesInput segmentation, instruction-data separation, response validation
Credential accessAgent reads sensitive files on injected instructionsBoundariesRuntime sandboxing, file system access controls, risk-tiered approval gates
ExfiltrationAgent transmits credentials via outbound networkObservability + BoundariesEgress filtering, network segmentation, behavioral anomaly detection
PersistenceNo detection surface for the compromiseObservabilityConfiguration file monitoring, tool invocation logging, credential access auditing

The pattern is clear: every stage of this attack crosses a boundary that ROBOT's Boundaries component is designed to enforce. The attack succeeds because those boundaries do not exist in the default configuration of any of the four targeted tools.

SANDWORM_MODE is not a failure of any single control. It is the compounding result of absent controls at every trust boundary in the tool chain.


Case study: Architectural choices that reduce exposure

Not every agent architecture is equally exposed to this class of attack. The structural decisions that determine exposure are made at design time, not at detection time.

Our own agent, PAI, was not affected by SANDWORM_MODE — not because we detected and blocked it, but because the architectural decisions that define PAI's runtime eliminate the attack surface entirely.

No npm dependency chain. PAI runs as a standalone process on a dedicated infrastructure instance. There are no npm packages to install, no node_modules directory to poison, no package.json for a malicious package to infiltrate. The supply chain entry point that SANDWORM_MODE exploits does not exist.

No MCP server integration. PAI's tool capabilities are defined through a skills system — static configuration files that specify what the agent can do, reviewed and deployed through the same version control workflow as any other infrastructure code. There is no dynamic tool discovery, no runtime server registration, no protocol-level mechanism for an external process to inject new capabilities.

Dedicated infrastructure with network controls. PAI runs on an isolated instance with explicit egress rules. Outbound network access is limited to defined endpoints. An exfiltration attempt to an attacker-controlled domain would not resolve.

Human-in-the-loop by default. PAI operates through a conversational interface where a human principal reviews and approves significant actions. The agent proposes; the human disposes. A prompt injection that instructs PAI to exfiltrate credentials would surface as a visible action in the conversation, not as a silent background operation.

These are not exotic defenses. They are architectural choices: static tool definitions instead of dynamic discovery, explicit network boundaries instead of unrestricted access, human oversight instead of full autonomy. Any organization can make the same choices. The trade-off is that you give up some of the flexibility of a plugin-based extensibility model. What you gain is an attack surface that SANDWORM_MODE cannot reach.

This is not about any specific tool being "better" or "worse." It is about understanding the security implications of architectural decisions. A dynamic tool registry is a powerful feature. It is also an attack surface. Organizations should make that trade-off consciously, with appropriate controls in place.


The broader pattern: extensibility as attack surface

SANDWORM_MODE is the first. It will not be the last.

The pattern it establishes — poisoning the tool layer that agents depend on, using the agent's own capabilities as the attack mechanism — is generalizable to any AI system with an extensibility model. Every plugin marketplace, every tool registry, every configuration file that an external process can modify is a potential entry point for the same class of attack.

The industry is building agent ecosystems optimized for capability and interoperability. MCP is growing. Tool marketplaces are launching. Agent-to-agent communication protocols are emerging. Each of these increases the power of AI systems — and each introduces trust boundaries that, if left unverified, become attack surfaces.

The security discipline for this era is not fundamentally new. It is supply chain security applied to a new substrate. The principles are the same: verify provenance, enforce least privilege, segment trust domains, monitor for anomalies, design for containable failure. What is new is the attack surface — and the speed at which a compromised tool can cause damage when the tool's operator is an autonomous agent rather than a human.


Start here

If your organization uses AI coding tools in development workflows, these are the immediate actions:

  1. Audit your MCP configurations now. Check every developer workstation for unexpected MCP server entries in Claude Code, Cursor, VS Code Continue, and Windsurf configuration files. Document what is there and verify each entry against known, approved servers.

  2. Implement configuration file integrity monitoring. Treat MCP configuration files as security-critical. Alert on any modification not associated with an approved change process. This is the lowest-cost, highest-impact control you can deploy today.

  3. Restrict tool registration to verified sources. If your AI coding tool supports allowlisting MCP servers, enable it. If it does not, advocate for the feature — and in the interim, monitor configuration files as a compensating control.

  4. Segment developer environment network access. Apply egress filtering to the network paths used by AI coding tools. Allow approved API endpoints. Deny everything else. This does not prevent the attack, but it contains the exfiltration stage.

  5. Rotate credentials on any exposed workstation. If you cannot confirm that MCP configurations are clean, assume compromise. Rotate SSH keys, AWS credentials, LLM API tokens, and any other secrets accessible from the developer's environment.

The Safe Autonomy Readiness Checklist covers these controls and 40 more across 8 governance dimensions — including supply chain integrity, tool provenance, runtime isolation, and network segmentation.

If your team is deploying AI coding tools and has not assessed the tool trust model, or if you want help designing agent architectures that are structurally resistant to supply chain attacks, we should talk.

Contact Atypical Tech


References

Related Posts