Project Glasswing and Mythos Preview: What 10,000+ AI-Found Vulnerabilities Mean for Software Security

Cartoonish AI glasswing scanning digital city of servers and code

In the weeks since Anthropic unveiled Project Glasswing and the Mythos Preview model, a startling new reality has emerged: AI can now find critical flaws across the software stack at an unprecedented scale. Early collaborators and independent testers report thousands of high- and critical-severity findings across essential infrastructure and widely used open-source projects. That rapid discovery is a boon for defenders—if we can solve the bottleneck that remains: human triage, disclosure, and patching.

What Project Glasswing has achieved

Project Glasswing brought together Anthropic and roughly 50 partners who maintain systemically important software. Using Mythos Preview, teams collectively identified more than ten thousand vulnerabilities in a matter of weeks, with individual partners finding hundreds apiece. Some organizations report a tenfold (or greater) increase in bug-finding rates versus prior approaches. The practical upshot is clear: AI substantially accelerates discovery of realistic, high-impact vulnerabilities across critical systems.

Evidence from partners and external evaluations

Independent assessments line up with partners’ experiences. Examples from early testing include:

  • Cloudflare: ~2,000 bugs found across critical systems, ~400 rated high or critical, with a false positive rate their team judged better than human testers.
  • Mozilla: using Mythos, testers identified and fixed 271 vulnerabilities in Firefox 150—more than ten times the findings from an earlier model iteration.
  • Third-party platforms and academic benchmarks (XBOW, ExploitBench, ExploitGym) have rated Mythos Preview as a leading performer in exploit development and web-exploit benchmarks.

Taken together, these results suggest Mythos-class models can routinely find real, actionable vulnerabilities and, in some cases, construct exploit proofs-of-concept.

Open-source scanning: scale, validation, and an example

Anthropic used Mythos Preview to scan over 1,000 open-source projects. The model estimated 6,202 high- or critical-severity issues out of 23,019 total findings. A sample of 1,752 of the model’s high/critical estimates was independently assessed: 90.6% were true positives, and 62.4% were confirmed as high or critical. Projected forward at current triage rates, Mythos Preview could surface nearly 3,900 high- or critical-severity open-source vulnerabilities in addition to those found for Glasswing partners.

One concrete example was a now-patched vulnerability in wolfSSL (tracked as CVE-2026-5194), where Mythos Preview constructed an exploit capable of forging certificates and enabling convincing fake sites—illustrating both the model’s power and the real-world stakes.

The triage and disclosure bottleneck

Finding vulnerabilities is no longer the dominant constraint; verifying, reporting, and patching them is. Anthropic’s coordinated disclosure process follows the conventional 90-day window to allow maintainers and users time to patch, but humans are overwhelmed by the volume of AI-produced findings. Some maintainers have even asked for slowed disclosure rates to catch up. Key statistics underscore the friction:

  • 1,129 unvetted bugs were disclosed at maintainers’ request, 175 estimated high/critical by the model.
  • Anthropic estimates having disclosed around 530 high- or critical-severity bugs so far; 75 of those have been patched and 65 publicly advised.
  • The average time to patch a high- or critical-severity bug reported by Mythos Preview is about two weeks once triage begins, but the overall visibility of patches remains constrained by the disclosure cadence and maintainer capacity.

These patterns reveal a mismatch: automated discovery outpaces human capacity to validate and remediate, creating an interim period of heightened risk.

How defenders should adapt right now

Organizations and open-source maintainers can take pragmatic steps to reduce exposure while longer-term solutions evolve:

  • Shorten patch cycles: prioritize and accelerate testing and deployment of security fixes. Where possible, streamline the approval path for high-severity patches.
  • Make updates frictionless for end users: improve auto-update mechanisms, stronger prompting for critical updates, and persistent upgrade strategies for noncompliant devices.
  • Harden defaults and reduce attack surface: apply configuration hardening, minimize unnecessary services, and enforce least privilege.
  • Enforce strong identity controls: use multi-factor authentication, robust key management, and monitoring for unusual access patterns.
  • Improve detection and logging: comprehensive telemetry shortens time-to-detect and time-to-respond when exploits appear.
  • Prioritize risk using threat modeling: map which components, versions, or configurations are most likely to be targeted and focus resources there.

These are not novel recommendations, but the urgency to execute them has increased—rapidly—because AI compresses the time between discovery and potential automated exploit development.

Tools, programs, and ecosystem responses

To help defenders operationalize AI-based scanning, Anthropic has already made tools and programs available:

  • Claude Security (public beta for Claude Enterprise): a tool that scans codebases and can propose fixes; an early report showed over 2,100 vulnerabilities patched using Claude Opus 4.7.
  • Cyber Verification Program: enables qualified security professionals to use models for legitimate testing without certain default misuse safeguards.
  • Shared “skills” and harnesses: reusable instructions and orchestrations to map codebases, spin up scanning subagents, triage findings, and produce reports.
  • Partnerships with maintainers and foundations: collaborations with entities like the Open Source Security Foundation’s Alpha-Omega project aim to assist maintainers in triage and response.

Industry benchmarking and research efforts (ExploitBench, ExploitGym, other metrics) are being supported to track capabilities and defense progress over time.

Longer-term challenges and opportunities

Mythos-class capabilities point toward two key long-term shifts:

  • Positive: If developers integrate AI-assisted tools into secure development lifecycles, many classes of bugs can be caught earlier—reducing shipped vulnerabilities and making software intrinsically safer.
  • Risk: During the transitional window where discovery outpaces remediation, there is a larger pool of known-but-unfixed vulnerabilities that could be abused by actors who also gain access to powerful models.

Managing that transition requires coordinated action across vendors, open-source maintainers, security teams, and the wider research community to scale triage and patching capacity, share high-quality vulnerability data, and invest in automation for safe disclosure workflows.

Practical next steps for readers

If you manage software or infrastructure today:

  • Inventory and prioritize: focus on the most critical, externally-facing components.
  • Increase cadence: shorten the loop from vulnerability report to tested patch and deployment.
  • Adopt vetted AI scanning tools where appropriate, but combine them with human triage workflows to reduce false positives and ensure safe disclosure.
  • Coordinate: participate in or build collaborations with other vendors and maintainers to pool triage resources for shared dependencies.

For security researchers and tool builders:

  • Invest in automation around triage, reproducible proof-of-concept generation, and patch suggestion workflows—these reduce human bottlenecks while preserving quality.
  • Contribute to standards and tooling for coordinated disclosure that scale with higher reporting volumes.

Conclusion

Project Glasswing’s early results are a watershed moment: AI can now find high-impact software vulnerabilities at a scale that outpaces traditional triage and patching processes. That capability is ultimately a net positive for security—if defenders act urgently to scale verification and remediation, shorten patch cycles, and harden systems against exploitation. In the meantime, pragmatic operational changes, better tooling for triage, and broader collaboration across the ecosystem are essential to turn AI’s vulnerability-discovery power into durable improvements in software safety.

Leave a Reply

Your email address will not be published. Required fields are marked *