Vibe Coding's $1.5M Mistake

April 6, 202612 min readAtypical Tech

vibe-coding ai-agents security code-review boundaries

Illustration for Vibe Coding's $1.5M Mistake

A penetration testing firm recently audited 15 applications built using "vibe coding" — the increasingly popular practice of describing what you want to an AI coding assistant and shipping whatever it produces. They found 69 exploitable vulnerabilities. Six were critical: database reads, session hijacks, root escalations. All proven exploitable. Not theoretical. Not "potential." Working exploits.

Fifteen apps. Sixty-nine vulnerabilities. An average of 4.6 security flaws per application.

At industry-standard remediation costs for vulnerabilities found in production — $15,000 to $25,000 per critical finding, $8,000 to $15,000 per high — the bill for cleaning up those 15 applications lands somewhere around $1.5 million.

The AI wrote the code in minutes. The security debt will take months to pay off.

Vibe coding has a quality problem

The term "vibe coding" was coined by Andrej Karpathy in early 2025 to describe a workflow where developers lean heavily on AI assistants — Cursor, Copilot, Claude Code, Windsurf — to generate code from natural language descriptions. The developer provides the intent. The AI provides the implementation. The developer ships it.

The appeal is obvious. A solo founder can build an MVP in a weekend. A three-person startup can output code at the rate of a twenty-person engineering team. Prototyping timelines collapse from months to days. The productivity gains are real, measurable, and increasingly well-documented.

What's not well-documented is what ships alongside that productivity.

Vibe coding optimizes for velocity. Vulnerabilities optimize for exactly the same thing.

The penetration testing audit — conducted against real applications built by real teams using real AI assistants — found the same vulnerability patterns recurring across independent codebases. SQL injection. Broken authentication. Insecure direct object references. Missing rate limiting. Hardcoded secrets. The OWASP Top 10, deployed fresh to production, generated by models that have ingested the OWASP Top 10 in their training data.

The AI knows what SQL injection is. It generates it anyway. Not because the model is broken, but because code generation optimizes for functionality — making the thing work — not for adversarial resilience. When you ask an AI to build a login page, it builds a login page. It doesn't ask: "What happens when someone sends 10,000 requests per second?" or "What if the session token is predictable?" or "What if someone passes a SQL payload in the username field?"

Those questions come from an adversarial mindset. AI coding assistants don't have one.

The trust gap that matters

The audit findings don't exist in isolation. They land in an environment where organizations have already decided to trust AI-generated code.

Armis tested code output from all 18 leading AI models across 31 security test scenarios. The result: a 100% failure rate. Every model, across every scenario, produced code with exploitable vulnerabilities. Not most models. Not in some scenarios. All of them. Every time.

100% of AI models generate insecure code. 77% of IT leaders trust it without review. That gap is where your next breach lives.

Meanwhile, survey data shows 77% of IT decision-makers trust AI-generated code without additional security review. The numbers tell the story: the tools produce vulnerable code 100% of the time, and three-quarters of the people deploying it aren't checking.

This is not an AI problem. This is a process problem that AI accelerates.

Before AI coding assistants, a developer who didn't understand secure coding patterns produced vulnerable code at human speed — maybe a few hundred lines per day. That same developer, augmented by an AI assistant, now produces vulnerable code at machine speed. The velocity multiplier that makes vibe coding attractive is the same multiplier that scales the security debt.

The $1.5 million math

The remediation cost for the 15 audited applications deserves scrutiny because it illustrates a dynamic that most organizations haven't modeled: the cost curve of AI-generated security debt.

Traditional software development produces vulnerabilities at a rate that security teams have, over decades, built processes to manage. Code review catches some. Static analysis catches more. Penetration testing catches what's left. The flow rate is manageable because human developers write code at human speed.

AI coding assistants increase the flow rate by an order of magnitude. A developer using Cursor or Copilot can produce functional features 3-10x faster than without assistance. If the vulnerability rate per feature remains constant — and the audit data suggests it does — then the absolute volume of vulnerabilities entering production increases proportionally.

Metric	Traditional	Vibe Coding
Features per week	2-3	10-30
Vulnerabilities per feature	~0.5	~0.5
Vulnerabilities per week	1-1.5	5-15
Time to remediate in production	$15-25K each	$15-25K each
Monthly security debt	$15-37K	$75-375K

The cost per vulnerability doesn't change. The volume does. That's the trap.

At the high end, an organization vibing its way through product development can accumulate $375,000 in security debt per month. Over a quarter, that's $1.1 million. Over two quarters — which is how long it typically takes for a startup to go from "MVP shipped" to "first penetration test" — you're looking at $1.5 to $2 million in remediation costs that nobody budgeted for.

And that's just remediation. If one of those six critical vulnerabilities gets exploited before it gets found, the cost shifts from remediation to incident response — and the IBM Cost of a Data Breach Report puts the average breach at $4.88 million.

Why the AI doesn't catch what the AI creates

A reasonable question: if AI can generate code, why can't it also generate secure code? Or at least flag the vulnerabilities it introduces?

The answer is architectural. Code generation and security analysis are fundamentally different tasks that optimize for different objectives.

Code generation optimizes for functionality. The model's objective is to produce code that does what you asked it to do. If you ask for a user registration endpoint, the model generates code that registers users. It satisfies the functional requirement. It does not — cannot, with current architectures — reason about what an adversary might do with that endpoint.

Security analysis optimizes for adversarial resilience. It asks: what happens when the input is malicious? What happens when the user is an attacker? What happens when the system is under load? What happens when the session state is manipulated? These are negation problems — reasoning about what should not happen — and they require a threat model that code generation models don't construct.

The AI builds what you asked for. Security is about what you didn't ask for.

Some AI coding tools have started adding security-focused features — vulnerability scanning, secure code suggestions, automated fixes. These are improvements. They are not solutions. The fundamental dynamic remains: the developer using vibe coding is optimizing for speed, the AI is optimizing for functionality, and security is orthogonal to both objectives.

The Kiro incident illustrated this at the infrastructure level — an AI agent optimized for "fix the bug" and decided to delete production because no constraint prevented it. Vibe coding creates the same dynamic at the code level. The AI optimizes for "build the feature" and produces SQL injection because no constraint prevents it.

The startup trap

Vibe coding's security problem hits startups hardest, and that's by design.

The entire value proposition of vibe coding is that small teams can build faster with fewer people. A solo founder or a two-person team can produce a working product in weeks instead of months. That speed is genuinely transformative for early-stage companies operating with limited capital and tight market windows.

But those same constraints — small team, limited capital, tight timeline — are exactly the conditions that make security review unlikely. The founder who vibed out an MVP over a weekend is not going to spend the next month conducting a security audit. The two-person team shipping features at 10x speed is not going to slow down for threat modeling.

The result is a growing population of production applications — handling real user data, processing real payments, storing real credentials — that have never been reviewed by anyone with a security mindset. Not the AI that generated the code. Not the developer who shipped it. Not a security engineer who was never hired.

The same constraints that make vibe coding attractive make security review improbable.

The penetration testing audit of 15 vibe-coded applications is a small sample. The actual population of unaudited vibe-coded applications in production is orders of magnitude larger. GitHub's 2025 Octoverse report showed AI coding assistant adoption among developers exceeding 90% in some segments. Not all of those developers are vibe coding. But enough are that the aggregate security debt is a systemic risk.

What defense looks like

The answer is not to stop using AI coding assistants. The productivity gains are real. The competitive pressure to adopt them is intense. And the genie is not going back in the bottle.

The answer is to build security controls that match the velocity of AI-assisted development.

1. Treat AI-generated code as untrusted input

Every line of code produced by an AI assistant should be treated with the same scrutiny you'd apply to a pull request from an unknown contractor. It might be excellent. It might contain SQL injection. You won't know until you look.

This is a mindset shift, not a tooling change. The developer using vibe coding needs to understand that "it works" and "it's secure" are completely different assertions.

2. Add security scanning to your CI/CD pipeline

If you're shipping AI-generated code without automated security scanning, you're shipping blind. At minimum, integrate SAST (static analysis) and SCA (software composition analysis) into your build pipeline. These tools won't catch everything — the audit found vulnerabilities that static analysis typically misses — but they'll catch the low-hanging fruit before it reaches production.

The ROI calculation is simple. A SAST tool that catches 50% of vulnerabilities before deployment saves you $375,000 to $750,000 in production remediation costs per year. The tool costs a fraction of that.

3. Require security review for AI-generated features

Before any AI-generated feature ships to production, someone with security knowledge should review it. This can be a security engineer, a senior developer with security training, or an external review — but it cannot be no one.

If you don't have security expertise on your team, this is the gap that needs filling first. Not more AI tooling. Not faster shipping. A human who thinks about what happens when the input is malicious.

4. Conduct adversarial testing before launch

If you've built an application primarily through vibe coding, invest in a penetration test before you launch or as soon as possible after. The $15,000 to $30,000 cost of a professional security assessment is a rounding error compared to the $1.5 million remediation cost of finding those vulnerabilities later — or the $4.88 million average cost of a breach.

The 15-app audit that uncovered 69 vulnerabilities cost far less to conduct than any single one of those vulnerabilities will cost to remediate in production.

5. Build security into your prompts

When using AI coding assistants, include security requirements in your prompts. Instead of "build a login page," try "build a login page with rate limiting, parameterized queries, secure session management, CSRF protection, and input validation." The output won't be perfect, but it will be materially better.

This is the Boundaries pillar of ROBOT applied to the development workflow: constrain the AI's output space to include security properties, not just functional ones.

The market is moving

The audit of 15 vibe-coded applications is an early signal. As AI coding assistants become standard development tools — and they already are — the frequency and scale of these findings will increase.

Deloitte reports that 75% of organizations plan agentic AI deployments in the next two years, but only 20% have mature governance frameworks. That 55-point gap between adoption intent and governance readiness is where security debt accumulates fastest.

75% are planning to deploy. 20% are ready to deploy safely. The other 55% are about to learn why governance matters.

Regulatory frameworks are catching up. NIST SP 800-218 Rev. 2 now includes AI-specific supply chain requirements. SOC 2 Trust Services Criteria CC9.5 requires AI governance controls. The OWASP Top 10 for Agentic AI ranks insecure code generation as a material risk. Organizations that accumulate security debt from vibe coding today will face compliance pressure to remediate it tomorrow.

The $1.5 million question isn't whether vibe coding produces vulnerabilities. The audit answered that. The question is whether your organization knows how many it has already shipped.

The code shipped fast. The bill arrives slow. But it arrives.

If your team has been building with AI coding assistants and hasn't conducted a security review yet — that's the gap we close. We help engineering teams find what the AI left behind, before someone else finds it first. Let's talk about your codebase.

References

Dev.to, "69 Vulnerabilities Found in 15 Vibe-Coded Apps," April 2026
Armis, "State of AI Security: 100% of AI Models Generate Insecure Code Across 31 Test Scenarios," March 2026
Deloitte, "Agentic AI Adoption Survey: 75% Plan Deployment, 20% Have Governance," April 2026
IBM Security, "Cost of a Data Breach Report 2025," July 2025
ESET, "UK Manufacturing Cybersecurity Survey: 78% Hit, 88% AI Attack Success Rate," April 2026
Atypical Tech, "Amazon Kiro Deleted Production," March 2026
Atypical Tech, "The ROBOT Framework," December 2025
Atypical Tech, "The OWASP Top 10 for Agentic AI," February 2026

Project Glasswing: AI Finds Zero-Days Faster Than Humans Can Patch Them

Anthropic's Project Glasswing deployed Claude Mythos Preview to autonomously discover thousands of zero-days with a 72.4% exploit success rate. Less than 1% of findings have been patched. The bottleneck is no longer discovery — it's everything that comes after.

Guardrails Failed. Now What?

Static AI guardrails are failing in production. Langflow was exploited within 20 hours. Cline was compromised through a GitHub issue title. Here's what actually works instead.

When AI Finds Bugs in Its Own Code

Claude found 22 Firefox vulnerabilities in two weeks. XBOW discovered a 9.8-rated Windows CVE without source code. But 50% of AI-generated code ships with exploitable flaws. AI is better at finding bugs than avoiding them — and that gap defines the next era of application security.