AI Tools11 min read

Mozilla Firefox + Claude Mythos: 271 Bugs Found in 30 Days

How Mozilla's AI-driven vulnerability pipeline used Claude Mythos to find 271 Firefox bugs in April 2026 — methodology, results, lessons.

CL

ComputeLeap Team

Share:
Mozilla Firefox hardened by Claude Mythos: 271 bugs filtered out by an AI security mesh

In April 2026, Mozilla patched 423 security bugs in Firefox. Their 2025 monthly average was 21. The 20x jump wasn't a fuzzing breakthrough or a bug-bounty surge — it was the first full month of an agentic AI security pipeline running Anthropic's Claude Mythos Preview against Firefox source code. Of the 423 fixes, 271 were attributed directly to Mythos: 180 sec-high, 80 sec-moderate, 11 sec-low. They shipped in Firefox 150 (released April 21) plus dot-releases 149.0.2, 150.0.1, and 150.0.2.

Most coverage of Mythos this week has fixated on the offensive side — the UK AI Safety Institute's cyber-capability evaluation, the Trump administration's AI-safety reversal, and the cybersecurity-establishment debate over dual-use risk. That framing buries the more durable story. The same capability that worries regulators is, right now, doing defensive work in production against the browser used by every paranoid security team that doesn't trust Chrome. That's what this piece is about: what Mozilla actually built, what the AI actually found, and what operators should copy.

Anthropic's announcement of the Mozilla Firefox + Claude Mythos collaboration

The numbers, before anything else

Mozilla's own blog post put the headline as bluntly as possible: "the zero-days are numbered." The arithmetic supports the swagger.

PeriodBugs patchedNotes
2025, monthly average~21Pre-AI-pipeline baseline
Jan 2026 (2-week Opus 4.6 run)2214 sec-high — ~20% of all 2025 high-severity Firefox bugs
April 2026423271 from Mythos Preview; 180 sec-high

The January result is the under-reported part. Anthropic and Mozilla ran a two-week scan with Claude Opus 4.6 before Mythos was ever in the picture. That run alone matched roughly a fifth of all the high-severity Firefox bugs patched in the entire prior year. It's what earned Mozilla early access to Mythos in the first place — and it's the result smaller orgs should look at, because Opus 4.6 is generally available.

The headline is the volume; the lesson is the methodology. Mozilla didn't ship 423 fixes because Mythos is a magic vulnerability oracle. They shipped because they wired an agentic harness with the right interfaces — and the harness can run reproducible test cases to confirm or reject hypotheses dynamically.

The pipeline, end to end

Anthropic's writeup is unusually concrete on the methodology. The agentic scaffold is simple, and the simplicity is the point:

  1. Spin up a container isolated from the Internet, with the project-under-test (Firefox source) loaded inside.
  2. Invoke Claude Code with Mythos Preview and prompt it to find a security vulnerability.
  3. Mythos reads the code to form hypotheses about where vulnerabilities might live.
  4. Mythos runs the actual project inside the container to confirm or reject those hypotheses.
  5. If a hypothesis confirms, Mythos outputs a bug report with a proof-of-concept exploit.

The third and fourth steps are what makes this different from every previous wave of "AI for static analysis." Mozilla's own framing is worth quoting in full:

"The introduction of agentic harnesses that can reliably detect security issues has completely changed this. These can find real bugs and dismiss unreproducible speculation. The key feature of such a harness is that, given the right interfaces and instructions, it can create and run reproducible test cases to dynamically test hypotheses about bugs in code."

This is the verification shift. AI YouTube's Nate B Jones framed it the same way in his Mozilla deep-dive: the move "from AI writes code to AI audits code." AI LABS showed the same pattern with Vercel DeepSec catching bugs pre-ship. Once a model can run the project it's analyzing, the entire static-vs-dynamic gap collapses.

Simon Willison's amplifier post on the Mozilla + Mythos result

Three bugs that explain why this works

Volume metrics are easy to inflate. The shape of the bugs is harder to fake. Three of Mozilla's named findings show why a model that can both read and run the code is qualitatively different from one that can only do one.

Bug 2024437 — the 15-year-old <legend> flaw

This was a parser-level vulnerability in the <legend> HTML element that had been latent in Firefox for fifteen years. Triggering it required meticulous orchestration of recursion stack depth and cycle-collection edge cases simultaneously. Fuzzers couldn't reach it because they don't reason about call-stack interactions across systems — they generate inputs and watch for crashes. Mythos hypothesized the interaction by reading the code, then constructed the trigger sequence by running it.

Bug 2025977 — the 20-year-old XSLT use-after-free

Even older. Inside Firefox's XSLT engine, reentrant key() calls caused a hash table to free its backing store while a raw pointer remained live elsewhere. This is the classic class of bug that humans miss when reviewing because the call graph is non-obvious — key() calling key() calling key() traverses code paths that look unrelated until you trace the actual execution. The model traced the actual execution.

Bug 2021894 — IPC race → sandbox escape

The most operationally consequential of the three. A race condition over IPC allowed a compromised content process to manipulate IndexedDB refcounts, trigger a use-after-free, and use it as a primitive for sandbox escape. Sandbox escape bugs are notoriously difficult to surface via traditional fuzzing, and Help Net Security's analysis flagged this category as "where AI coverage is particularly valuable." Compromised-content-process → parent-process attack chains are the actual threat model real Firefox users care about, and this is exactly the layer Mythos contributed most.

The bug ages — 15 years, 20 years, multi-year — are the operator signal. These weren't recent regressions. They survived a decade-plus of human review, fuzzing, and adversarial bug bounties. The methodology change is what surfaced them.

Hacker News discussion of the Mozilla Hacks hardening Firefox post

What Mythos failed to exploit — and why that's the real validation

This is the part most coverage skipped, and it's the part operators should copy.

Help Net Security's writeup noted that Mozilla's audit logs revealed numerous AI-driven attempts to exploit prototype pollution for sandbox escapes — and all of them failed. They failed because Mozilla had made an architectural decision earlier to freeze JavaScript prototypes by default in privileged contexts. That hardening had been deployed in response to clever human-researcher reports years prior, where prototype pollution in the privileged parent process had been shown to enable sandbox escape. Mozilla took the lesson and froze the surface.

When Mythos came at the same surface, the audit logs captured every attempt and every failure. That's a measurable, falsifiable validation of prior defense-in-depth work. The AI tried, the architecture held.

For operators, this is the actual takeaway. A sufficiently capable AI-audit pipeline doesn't just find bugs — it measures the value of past hardening decisions by trying to defeat them and failing. If your team has shipped defense-in-depth work that couldn't easily be tested, an agentic harness can now produce that test data.

How January's Opus 4.6 run set this up

The Mozilla–Anthropic relationship didn't start with Mythos. It started in January with Claude Opus 4.6. Anthropic's security researchers ran a two-week scan with Opus 4.6 against Firefox: 22 vulnerabilities found, 14 of them sec-high. That run, by itself, matched roughly a fifth of all high-severity Firefox bugs patched throughout 2025. It's the result that earned Mozilla early access to Mythos Preview — and it's the result that's most relevant to anyone reading this who isn't running a flagship browser.

Opus 4.6 is generally available. The agentic-harness pattern is reproducible. You can pull the methodology out of Anthropic's writeup, set up an isolated container, point Claude Code at your codebase, and start running the same loop on smaller projects today. The Mozilla–Mythos result is what 423-bugs-in-a-month looks like with the latest model on a flagship-scale codebase. The Mozilla–Opus-4.6 result is what 22-bugs-in-two-weeks looks like with a generally-available model on the same target. The methodology generalizes downward.

The pattern is not Mozilla-specific

Anthropic disclosed alongside the Mozilla post that Project Glasswing — an industry consortium — has granted monitored Mythos access to more than 40 organizations maintaining critical software. The Mozilla writeup is the first public, technical case study, but it's not the only run. Expect more results from the consortium over the next quarter.

The market has already noticed. Polymarket prices Anthropic at 94% on best Coding AI model end of May, up 9% on the week. The same exchange has Anthropic at 76% on best AI model overall and 64% on first to ship a #1 model by June 30. Pricing this confident this fast doesn't usually move just on a demo — it moves when concrete deployments validate the underlying capability claim. The Mozilla numbers are the most recent push.

r/ClaudeAI weekly top — community sentiment around the Claude Mythos launch

Community sentiment is tracking the same direction. r/ClaudeAI's "If the EU had built Claude" meme image went from 1,716 points on day one to 5,201 on day two — a 3x expansion, not the typical Reddit decay curve. That's persistent post-launch mindshare, the kind that only sticks when the underlying product is doing visible work. Hardening Firefox is visible work.

What operators should take from this

If you're running a security team smaller than Mozilla's — which is most teams — the playbook from this writeup is:

1. Set up the container. The methodology Anthropic published is operationally simple. Isolate from the Internet, mount the project source, expose a runtime. The point of isolation is not security paranoia (although that helps) — it's reproducibility. Hypotheses need to be confirmed against the actual program, not against a description of the program.

2. Use the model that's available. You probably don't have Mythos Preview access. You don't need it. Opus 4.6 produced 22 vulnerabilities in two weeks against Firefox — a project of the largest possible scale. Claude Code, generally available today, can run the same agentic loop. If you're starting from "we've never run an AI audit on this codebase," any of the current generation will dramatically out-perform what you've been doing.

3. Capture the negative results. Mozilla's prototype-pollution insight is the operator-grade lesson buried in the writeup. The AI's failed exploit attempts are evidence that prior architectural hardening worked. Most teams don't have a way to test their defense-in-depth decisions empirically — an agentic harness gives you that test data, for free, every time you run it.

4. Read the supply-chain side of the AI-security ledger too. The Mozilla story is the defensive-AI lane. The supply-chain side — LiteLLM-class compromise, dependency-poisoning, model-runtime injection — is the offensive-AI lane operating against you. The two have to be planned together.

5. Compare with Anthropic's own production-grade security pattern — Claude Code's post-mortem on its first 50 production fixes is the methodological prequel to Mythos's Mozilla deployment. The harness pattern is the same; only the model changes.

The verification shift, in one sentence

Mozilla's blog post called it: the zero-days are numbered. What changed in 2026 is not that AI got smarter at writing code — that change has been arriving in installments since 2023. What changed is that AI got good enough at running and reasoning about running code that it can audit a codebase the way a senior security engineer audits one: form hypotheses, test them dynamically, dismiss the speculation, escalate the real findings, and produce reproducible PoCs.

The "Mythos discourse" of the last seventy-two hours has been about the offensive case — what happens when the same capability is pointed at your infrastructure. That conversation is real and the regulators are right to have it. But while it's happening, Firefox just shipped 423 fixes, validated a decade of architectural hardening with measurable negative results, and gave forty more organizations a working playbook. The defensive lane isn't theoretical anymore. It's just deployed.


Primary sources: Mozilla Hacks, Mozilla Blog, Anthropic news, Anthropic Red, Help Net Security, SecurityWeek, AISI, Simon Willison, Nate B Jones, AI LABS, Polymarket, r/ClaudeAI.

CL

About ComputeLeap Team

The ComputeLeap editorial team covers AI tools, agents, and products — helping readers discover and use artificial intelligence to work smarter.

💬 Join the Discussion

Have thoughts on this article? Discuss it on your favorite platform:

The ComputeLeap Weekly

Get a weekly digest of the best AI infra writing — Claude Code, agent frameworks, deployment patterns. No fluff.

Weekly. Unsubscribe anytime.