Cybersecurity

Anthropic's April: Mythos, Glasswing, and Pentagon Policy

Claude Mythos demonstrated AI's vulnerability-discovery capability at industrial scale. Project Glasswing organized defender access. The Pentagon designated Anthropic a supply chain risk. The NSA kept using Mythos. Here is what connects these threads — and what readers can do.

By Sam O'Brien, Insightful AI Desk

Anthropic spent April demonstrating that AI can autonomously discover thousands of previously unknown software vulnerabilities — including a flaw in OpenBSD that had not been identified in 27 years of expert human review, and a vulnerability in FFmpeg present for 16 years across video-capable applications. In the same month, the Department of Defense designated Anthropic a “supply chain risk,” a category historically applied to companies with foreign-adversary ties. Anthropic has filed suit challenging the designation. The National Security Agency, separately, has continued using Claude Mythos through its own procurement channel.

The two threads — Claude Mythos as a notable cybersecurity AI development, and Anthropic’s current standing under the supply chain risk framework — are connected. What links them is one provision in Anthropic’s usage policy that the company has retained, and a federal government that has objected to its scope in defense contexts.

What Mythos can do

Anthropic announced Claude Mythos Preview on April 7, 2026. The model sits above the Claude Opus tier in Anthropic’s product hierarchy and is not generally available. Its core documented capability is autonomous discovery of unknown software vulnerabilities at industrial scale.

The UK AI Safety Institute (AISI) independent evaluation is the most rigorous outside assessment published to date. Per AISI’s summary findings, Mythos achieves approximately 83% accuracy on test corpora of real-world code; can execute multi-stage attack chains when given network access; completes in hours what experienced human practitioners would take days to do; and finds memory-safety, injection, and race-condition flaws more reliably than logic flaws or business-logic bypasses.

Anthropic’s own internal benchmarks describe Mythos Preview as roughly 100 times more effective than the production Claude Opus 4.6 at generating working exploits from discovered vulnerabilities. This is a vendor-on-vendor comparison and is consistent with the directional findings of AISI’s assessment.

What is materially different about Mythos is integration. AI-assisted vulnerability discovery has been used by cybersecurity researchers for several years. Mythos performs the full discovery-to-exploit pipeline autonomously, on codebases the size of major operating systems, in hours rather than weeks, according to AISI’s evaluation.

The 27-year and 16-year findings

The OpenBSD discovery is worth dwelling on. OpenBSD is the security-focused operating system used in firewalls, VPN appliances, embedded high-assurance systems, and large portions of internet routing infrastructure. The project is known for its multi-decade audit culture, with code reviewed by some of the most experienced security practitioners in the field.

A 27-year-old issue surfacing in that environment suggests AI-assisted discovery now reaches code patterns that thorough human review can miss.

The FFmpeg finding has broader practical reach. FFmpeg is the multimedia library that ships inside web browsers, video conferencing software, social media apps, and most mobile applications that handle video. The flaw Anthropic disclosed had been present for 16 years.

Project Glasswing

Anthropic has not made Mythos Preview publicly available. Instead, on April 11 the company launched Project Glasswing, a closed partner program that gives major infrastructure operators and software vendors early access to Mythos for vulnerability discovery and remediation in their own systems.

Launch partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Per CyberScoop, more than 40 additional organizations responsible for critical software infrastructure have been added since launch.

Anthropic’s direct financial commitments to the program: $100 million in Mythos usage credits for Glasswing partners and $4 million in donations to open-source security organizations.

The Glasswing structure is unusual. Most frontier-model labs sell access; Glasswing partners receive it. Anthropic has stated in public communications that broad commercial sale of Mythos would equip both defenders and adversaries simultaneously, and that the defender-first window benefits from selective distribution.

Coverage of the program has been mixed. Some cybersecurity practitioners have noted the Glasswing list skews toward U.S. corporate giants and that the proportionality of $4 million in OSS security funding to the broader open-source maintenance need is a contested question. The Linux Foundation’s inclusion as a launch partner does direct some Mythos capacity at the open-source layer.

Bain & Company described Glasswing as a useful prompt for sectors that have under-invested in security tooling. PwC noted that the program’s signal value matters as much as its direct impact, because it surfaces the question of whether other organizations have equivalent vulnerability-finding capability. The IMF, in a public statement, suggested Mythos-class capability could affect systemic cyber risk if defender adoption lags broader access.

Alternative readings of Mythos’s capability

Not all observers read Mythos the same way. Cybersecurity researchers interviewed by CNBC in early May described the capability as continuous with existing AI-assisted tooling rather than a distinct step change. In their reading, Mythos is a continuation of a capability curve that already enables industrial vulnerability discovery in expert hands; what Anthropic added is integration and convenience, not capability that was previously unreachable.

That reading has merit. Many of the techniques Mythos uses (cross-referencing source against compiled binaries, chaining small flaws, automated code review) have been demonstrated in academic and industry research. Mythos’s contribution is performing them autonomously, at scale, without requiring expert direction at each step.

The practical takeaway is that the access floor for industrial-grade vulnerability discovery has dropped meaningfully. The capability ceiling may or may not have shifted; what has shifted is the resource level at which competent defense becomes available.

The dispute timeline

Mythos arrived during an ongoing discussion between Anthropic and the U.S. Department of Defense that predates the model’s announcement. Tech Policy Press maintains a public timeline; the documented sequence:

July 2025: Anthropic signs a $200 million contract with DOD, becoming the first AI lab to operate on Pentagon classified systems.
Late 2025 to early 2026: Per Tech Policy Press, conversations develop around the application of Claude in particular use cases.
February 24, 2026: Per Tech Policy Press, Defense Secretary Pete Hegseth communicates to Anthropic CEO Dario Amodei a request to permit unrestricted use of Claude “for all lawful purposes” by 5:01 p.m. Friday, February 27.
February 27, 2026: President Trump directs federal agencies to cease using Anthropic products; Hegseth designates the firm a “supply chain risk.”
March 24, 2026: Anthropic files suit in San Francisco federal court.
Late March: Judge Rita Lin issues a preliminary ruling finding the government’s actions likely violated law.
April 2026: The DC Circuit Court of Appeals denies Anthropic’s request for a stay, citing “weighty governmental and public interests on the other side of the ledger” while leaving the underlying merits unresolved.
May 1, 2026: Pentagon finalizes AI contracts with eight other vendors — OpenAI, Google, Microsoft, AWS, Nvidia, SpaceX, Reflection AI, and Oracle.

The operative term in the discussion is “all lawful purposes.” Anthropic’s usage policy, in place since the company’s 2021 founding, contains two specific provisions: it does not permit Claude to be used in fully autonomous lethal weapons systems that select and engage targets without a human in the loop, or for mass surveillance of American citizens. These provisions predate the Pentagon contract and had governed the existing contractual relationship.

Anthropic’s position

Anthropic’s public statement in response to the supply chain risk designation noted that frontier AI systems are not yet reliable enough to power fully autonomous weapons, and that the company will not knowingly provide a product that creates risks for U.S. service members or civilians. The company has characterized the request to remove its usage-policy provisions as broader than a clarification.

The Pentagon’s position, based on its public statements and the government’s litigation filings, is that operational flexibility for national security work benefits from unfettered access to AI models, and that vendor-imposed restrictions on otherwise lawful uses can limit legitimate applications. The supply chain risk designation reflects that view.

The legal merits are an open question. The Lawfare analysis published after Judge Lin’s preliminary ruling argued the designation will face significant judicial scrutiny on the merits. The DC Circuit, ruling preliminarily, identified substantial government interests on the other side. The case is expected to take months to resolve.

The NSA observation

The most informative fact about the current state of the discussion is the NSA’s use of Mythos.

While the Pentagon has designated Anthropic a supply chain risk, the National Security Agency has continued to use Mythos through a separate procurement channel since at least April, per Axios reporting. The reported use case is defensive: identifying vulnerabilities in U.S. government systems before they can be discovered by adversaries.

The NSA’s use suggests that federal agencies, in practice, continue to value Anthropic’s technology for cybersecurity work. The distinction in the Pentagon discussion is not about technology quality or safety but about the scope of permitted use cases under Anthropic’s product policy. That narrower framing helps clarify what the litigation will ultimately resolve.

OpenAI launches GPT-5.5-cyber

On May 7, OpenAI launched GPT-5.5-cyber, a cybersecurity-oriented variant available to vetted defensive security teams. The release follows Mythos by approximately one month and the Pentagon’s eight-vendor contracts by a week.

GPT-5.5-cyber’s specific capabilities have not yet been independently evaluated to a standard comparable with AISI’s Mythos assessment. OpenAI has not published the benchmark methodology, vetting criteria, or capability ceiling at the time of writing. As more information becomes available, direct comparison will be possible.

OpenAI’s and Anthropic’s policies in defense contexts are structured differently. OpenAI’s policy framework as currently published supports a broader range of defense-customer applications than Anthropic’s does. Both approaches have addressable enterprise markets, and which set of policy terms better fits a given customer is likely to depend on the customer’s specific use case and procurement context.

What this connects to

The headline framing of this story emphasizes national security and AI safety. That framing captures part of the picture. The NSA’s continued use of Mythos indicates that federal agencies retain confidence in Anthropic’s technology for cybersecurity work; the question in the litigation is narrower than overall trust in the vendor.

The supply chain risk designation, originally focused on foreign-supplier integrity concerns, is being applied here to a usage-policy disagreement with a domestic firm. Whether the framework extends to this context is the legal question now in front of the courts. The case is expected to take months to resolve, and its outcome will inform how vendor usage policies and government contracting interact going forward.

The more immediate effect of the discussion is industry-wide. Several frontier AI labs are watching closely. The visible pattern across labs is that few have published similarly specific usage-policy provisions in the months since the dispute became public — an observation reflected in industry analysis. Whatever the eventual ruling, this pattern is part of the current environment.

The story behind the story is the development of distinct positioning strategies in frontier AI. Anthropic emphasizes clarity in usage policy as part of its appeal to enterprise customers in regulated and trust-sensitive contexts. OpenAI emphasizes broader access and operational flexibility, including in defense applications. Both approaches have addressable markets, and the market is differentiating along the policy axis. Tracking how that differentiation evolves is more useful than predicting which approach “wins.”

Where the leverage is

For investors, the differentiation between policy-restricted and policy-flexible AI security tooling is a useful read of the current quarter. Anthropic’s Glasswing access creates defender advantages for partner-list enterprises (CrowdStrike, Palo Alto Networks, JPMorgan, the hyperscalers); OpenAI’s positioning supports defense-tech firms with FedRAMP-rated relationships. The constructive read is exposure to integrators and product layers benefiting from each path, and to security insurance carriers as they develop pricing models for this differentiation.

For builders, Mythos-equivalent capability is now demonstrably possible. Glasswing’s partner-only access is a near-term arrangement; within 12 to 18 months, equivalent capabilities are likely to exist in open-weights variants and in additional commercial products. The defensible value over that horizon sits in the orchestration and remediation layer — tools that triage Mythos-class vulnerability output, prioritize fixes, automate patch deployment, and verify closure across production environments. Discovery is the easy part; production closure is the open product category, and the startup market addressing it is still forming.

For enterprise CISOs whose organizations are outside the Glasswing partner list, the practical procurement question is three-way: wait for broader Anthropic availability (timing uncertain), build equivalent internal capability (impractical for most), or partner with security vendors that integrate Mythos-class tools (CrowdStrike and Palo Alto are obvious channels; security insurance carrier partnerships are emerging as a fourth path). The most useful read is that capable defense is becoming a procurable feature rather than an in-house build.

For policy researchers and state-level regulators, the case will produce binding precedent on vendor usage-policy enforceability within 12 to 24 months. Comment letters, amicus filings, and state-level legislative initiatives are the windows of influence while the case is open. The natural experiment in supply-chain-risk framework scope and the observable patterns of competitor usage-policy publication are publishable research questions.

What is worth doing, and what is worth watching

For readers in defensive security roles who do not have Glasswing access, Mythos-class autonomous discovery is not yet available, but a meaningful fraction of the practical value is reachable with existing tools. A pattern that works today: combine Semgrep’s free community rules for static analysis with an LLM pass on the highest-risk files — authentication, payment, parsing untrusted input, anything touching cryptography. Feed the Semgrep findings plus the relevant source into Claude or GPT with a structured triage prompt: classify exploitability, recommend remediation, flag false positives with reasoning. The result is comparable in noise reduction to commercial SAST products, and it is one of the highest-leverage LLM uses in security operations. The same pattern adapts to pull-request review (catching subtle injection and authorization mistakes) and SAST/DAST alert triage (typically reducing alert volume by 60 to 80 percent without losing exploitable findings). For developers building intuition for automated vulnerability discovery, HackerOne’s public disclosure feed and OSS-Fuzz bug reports are free corpora worth reading alongside LLM explanations.

Several open questions remain useful to track. OpenAI has not yet published GPT-5.5-cyber’s benchmark methodology, false-positive rates, or vetting criteria; an independent evaluation comparable to UK AISI’s Mythos work would enable direct comparison and inform procurement decisions. Mythos’s performance on non-English codebases is publicly unstudied, which matters for defensive coverage of international software supply chains. The capability diffusion timeline for autonomous vulnerability discovery — historically 12 to 24 months for frontier model capabilities to reach open weights — is unmodeled; published analysis would shape regulatory and partner planning. And the economics of usage-policy choices — whether enterprises measurably value policy clarity in their procurement — would benefit from empirical study.

The case to watch on its own merits is the litigation. The DC Circuit’s stay denial preserved the designation but did not address the underlying merits, which are likely to resolve over coming months. Intermediate signals worth tracking: whether other labs publish more specific usage-policy provisions, whether the Pentagon adjusts procurement, whether Mythos reaches broader availability, and whether U.S. states or allied governments adopt particular positions on usage-policy frameworks in their own contracting. Each is independently observable.

How we use AI and review our work: About Insightful AI Desk.