← Back to Blog

curl's Bug Bounty Is Dead. The Autopsy Tells Two Very Different AI Stories.

12 min readAtypical Tech

Updated March 12, 2026

Illustration for curl's Bug Bounty Is Dead. The Autopsy Tells Two Very Different AI Stories.

On January 31, 2026, Daniel Stenberg ended curl's bug bounty program. Seven years. Eighty-seven confirmed vulnerabilities. Over a hundred thousand dollars paid out. Done.

The reason was not funding. It was not a policy change at HackerOne. It was AI.

Specifically: a flood of AI-generated security reports — what the industry now calls slop — that overwhelmed a seven-person volunteer team until the program was no longer sustainable. In 2025, curl's confirmation rate collapsed from 15% to below 5%. One in twenty submissions was real. The rest were hallucinated vulnerabilities in functions that do not exist, buffer overflows in code that was never written, and exploit chains against protocols curl does not implement.

Stenberg called it a DDoS attack on the maintenance process. He was not exaggerating.

The cheapest attack in security is the one that wastes the defender's time.

But here is the part most coverage skipped. Four months before the shutdown, an AI-powered static analysis tool called ZeroPath found over 170 verified issues in curl — C footguns, logic bugs, RFC compliance violations across HTTP/3, SMTP, IMAP, TFTP, Telnet, and SSH/SFTP. These were real bugs that established tools like Coverity and CodeQL had not flagged. Stenberg's response: "These new tools are finding problems that none of the old, established tools detect."

Same codebase. Same technology category. Opposite outcomes. The variable was not AI. It was methodology.

AI doesn't have a quality problem. It has a methodology problem.


The anatomy of slop

Stenberg maintains a public catalog of AI slop submissions — 49 documented cases of fabricated vulnerabilities submitted to the curl project. Reading through them reveals a consistent pattern.

Hallucinated functions. Reports describe buffer overflows in curl functions that do not exist. The function names are plausible — they sound like something curl might have — but a five-second search of the codebase confirms they are inventions.

Fabricated exploit chains. One report described an HTTP/3 "CONTINUATION Flood" attack combined with a "Stream Dependency Cycle" vulnerability. Neither attack applies to curl's HTTP/3 implementation. The report read like an LLM that had ingested general HTTP/2 vulnerability research and projected it onto a different protocol.

Non-compiling code. Proof-of-concept snippets that reference headers, types, and APIs that do not exist in curl's build environment. The code is syntactically valid C but semantically disconnected from the actual project.

Confidence without substance. Every report is written with the authoritative tone of a senior security researcher. Severity ratings are assigned. Impact assessments are provided. CVSS scores are estimated. The packaging is professional. The content is fabricated.

This last point is the most damaging. When a report looks credible, a human reviewer has to invest significant effort to determine that it is not. Stenberg documented the cost: each bogus report engages three to four team members for thirty minutes to several hours. For a seven-person team where most members contribute roughly three hours per week to curl, a single slop report can consume the entire team's available capacity for a day.

"The never-ending slop submissions take a serious mental toll." — Daniel Stenberg

The mental toll is the part that metrics miss. Triage fatigue is cumulative. After months of evaluating reports that are 95% fabrication, the risk is not that maintainers accept a fake vulnerability. The risk is that they reject a real one.

Noise doesn't just waste time — it degrades the ability to hear signal.


What ZeroPath did differently

We've seen this pattern before in other domains: the tool that works is the one built on a methodology, not a prompt. The ZeroPath results were not an accident. They demonstrate what AI-assisted security research looks like when the engineering is serious.

Domain-specific tooling. ZeroPath is a purpose-built static analysis engine, not a general-purpose LLM asked to "find vulnerabilities." It understands C memory semantics, control flow, and the specific patterns that indicate bugs in systems code. The AI component operates within a framework designed by security engineers who understand what classes of bugs exist in C codebases.

Human validation in the loop. Joshua Rogers, the researcher who ran ZeroPath against curl, did not submit raw tool output. He reviewed, validated, and contextualized each finding before reporting it. The AI found candidates. A human with domain expertise confirmed them. This is the step that slop submissions skip entirely.

AI finds candidates. Humans confirm findings. Skip the second step and you have slop.

Verifiable reproduction. Every bug reported by ZeroPath came with evidence that could be independently verified — specific file paths, line numbers, and reproduction steps that map to actual code in the curl repository. There is no hallucination because the tool is grounded in the actual codebase, not in a language model's statistical approximation of what curl might look like.

Scope discipline. ZeroPath analyzed what exists. It did not speculate about what might exist. The bugs it found were in real functions, real protocols, real code paths. This seems obvious, but it is the fundamental failure mode of LLM-generated reports: they operate on a model of the codebase rather than the codebase itself.

Stenberg acknowledged the distinction explicitly: "AI tools when applied with human intelligence by someone with meaningful domain experience, can be quite helpful." The operative clause is not "AI tools." It is "human intelligence" and "domain experience."


The industry is split on what this means

The response from bug bounty platforms has been uneven. And honestly, that unevenness tells you everything.

HackerOne built Hai Triage, an AI-powered intake system designed to filter noise before it reaches human reviewers. Over 85% of HackerOne programs now use managed triage. The platform also clarified in February 2026 that researcher data is not used to train AI models — a direct response to concerns about the feedback loop between submissions and model improvement. CEO Kara Sprague: "You are not inputs to our models."

These are real improvements, but they were not enough for curl. Stenberg's position was that filtering slop at the triage layer still consumes platform resources and still lets some percentage through. The structural problem — that anyone can submit anything, and someone has to look at it — remains.

Filtering noise is necessary. But a better filter doesn't fix a broken incentive structure.

HackerOne's 2025 report tells the other side of the story. A 210% surge in valid AI-related vulnerability reports. A 540% increase in prompt injection vulnerabilities. Over 560 valid reports from autonomous agents. Seventy percent of surveyed researchers use AI tools. Three billion dollars in breach losses avoided through remediation.

These numbers are not contradictory. They are describing two populations using the same technology for opposite purposes. One population is using AI to accelerate genuine research. The other is using AI to mass-produce the appearance of research.

The technology does not distinguish between these use cases. The methodology does.


The observability gap

This is where the curl saga becomes a lesson that extends well beyond bug bounties. We've watched this same pattern emerge everywhere AI generates security-relevant claims.

Every organization adopting AI in security workflows — whether for vulnerability research, code review, threat analysis, or incident response — faces the same structural question: how do you distinguish between AI output that reflects reality and AI output that reflects plausible fabrication?

The slop problem is not unique to bug bounties. It is the same problem that will appear in every context where AI generates security-relevant claims:

  • AI-assisted code review that flags vulnerabilities in functions that were refactored out three versions ago
  • AI-generated threat assessments that describe attack chains against infrastructure you do not operate
  • AI-powered compliance reports that reference controls you have not implemented, citing frameworks that do not apply to your deployment model
  • Automated penetration testing that reports findings based on version fingerprints without confirming exploitability

In each case, the failure mode is identical: confident output that is disconnected from ground truth. And in each case, the cost is the same: human time spent determining whether the AI's claims correspond to reality.

Confidence without grounding is the most expensive kind of wrong.

This is an observability problem. Not observability in the infrastructure-monitoring sense — observability in the epistemic sense. Can you see what your AI system actually analyzed versus what it inferred, hallucinated, or interpolated from training data?

The curl slop reports had zero observability. They presented conclusions with no traceable path from input (the codebase) to output (the vulnerability claim). ZeroPath had full observability. Every finding linked directly to a specific location in the actual source code.


What this means for security teams

The curl bug bounty autopsy is a compressed version of a lesson every security team is going to learn over the next two years. AI tools are entering security workflows at every level — scanning, analysis, reporting, remediation. The teams that treat AI output as a starting point for human verification will accelerate. The teams that treat AI output as a finished product will drown in noise they cannot distinguish from signal.

AI output is a hypothesis. It becomes a finding only after a human says so.

If you operate a bug bounty program:

  1. Require structured evidence — file paths, reproduction steps, version-specific analysis — not narrative descriptions. Slop is fluent prose. Real research produces artifacts.
  2. Implement triage automation, but do not rely on it as the sole filter. HackerOne's Hai Triage is a useful layer. It is not a substitute for report structure requirements that make hallucination structurally difficult.
  3. Consider requiring reporters to disclose AI tool usage — not to ban it, but to set expectations for what validation looks like when AI is involved.

If you are integrating AI into security workflows:

  1. Treat every AI-generated security claim as a hypothesis, not a finding. The claim becomes a finding when a human with domain expertise has verified it against the actual system.
  2. Require provenance in AI output. Where did this finding come from? What was analyzed? What was the input? If the AI cannot show its work, the output is not actionable.
  3. Build feedback loops that measure false positive rates over time. The curl team tracked their confirmation rate — 15% dropping to 5% — which gave them the data to make a decision. Most teams adopting AI security tools have no equivalent metric.
  4. Distinguish between AI tools that analyze artifacts (code, configs, logs) and AI tools that generate claims from training data. The first category is bounded by reality. The second is bounded by plausibility.

If you are a security researcher:

  1. AI is a force multiplier for researchers who already have methodology. ZeroPath did not replace Joshua Rogers' expertise — it extended his reach across protocols and code paths he could not have audited manually at the same speed. That is the model.
  2. The bar for AI-assisted submissions is going up. Programs will increasingly require evidence of human validation. The researchers who build that into their workflow now will have an advantage when the requirements become universal.

The two AIs

The curl saga is the clearest case study we have for a distinction that matters more than any benchmark or capability announcement: the difference between AI as a research tool and AI as a report generator.

ZeroPath found 170 bugs because it was designed to analyze code, not to produce text that sounds like it analyzed code. The slop reports fooled nobody for long — but they consumed enough human attention to destroy a program that had been running successfully for seven years.

It only takes one person to build a wall. It takes an army of noise to make you stop trying.

Daniel Stenberg did not shut down curl's bug bounty because AI does not work. He shut it down because too many people are using AI without the methodology, domain expertise, and human validation that make AI output trustworthy.

The technology is the same. The methodology makes it constructive or destructive. That is not an AI problem. It is a governance problem. And governance — defining who is accountable, what boundaries exist, and what verification looks like — is the work that most organizations adopting AI have not done yet.

The curl bug bounty is a warning. The ZeroPath results are a proof of concept. The question for every security team is which story describes their own AI adoption.


References


Atypical Tech helps security teams govern AI adoption — from tool evaluation to workflow integration to organizational accountability. If your team is deploying AI in security workflows and you haven't defined what "verified" means for AI-generated findings, we should talk.

Related Posts