Anthropic Built an AI Too Dangerous to Ship. Then They Did Something Nobody Expected.
Anthropic created the most capable AI model ever, discovered it could break the internet, and chose restraint over profit. A developer perspective.
Yesterday, Anthropic announced they built a new AI model called Claude Mythos Preview. It's the most capable coding and cybersecurity model they've ever created. It found thousands of zero-day vulnerabilities across every major operating system, every major web browser, the Linux kernel, and software that billions of people use daily.
And then they didn't ship it.
They looked at what they had built, realized it could break the internet, and instead of racing to market, they called up Apple, Google, Microsoft, AWS, and eight other tech giants and said: "We need to fix this together before anyone else gets their hands on it."
I've been writing about AI building and breaking software for a while now. But this one stopped me in my tracks. Because in an industry where "move fast and break things" is still the unofficial motto, choosing not to ship might be the most radical thing an AI company has done in years.
They Built a Model That Can Break the Internet. Then They Didn't Ship It.
On April 7, 2026, Anthropic launched Project Glasswing. Not a product launch. Not a pricing page. A cybersecurity coalition.
The backstory: during internal testing, Anthropic realized their new model, Claude Mythos Preview, wasn't just good at writing code. It was terrifyingly good at finding flaws in existing code. We're talking about vulnerabilities that had been hiding in critical software for decades. Vulnerabilities that automated testing tools had run into millions of times without ever catching.
Here's where it gets interesting. In the Lord of the Rings, Isildur had the chance to destroy the One Ring. He stood at the edge of Mount Doom, held it in his hand, and chose to keep it. We all know how that ended.
Anthropic had their Isildur moment. They had a model that could find and exploit vulnerabilities in practically any software on the planet. The revenue potential was staggering. The pricing they eventually announced ($25 per million input tokens, $125 per million output tokens) gives you an idea of how valuable this capability is.
But instead of keeping the ring, they walked to the edge and let it go. They restricted access, built a coalition, and committed to 90 days of public reporting before even considering broader availability.
What Claude Mythos Can Actually Do
Let me put some numbers to this, because "really good at finding bugs" doesn't capture it.
On CyberGym Vulnerability Reproduction, Mythos scored 83.1% compared to Opus 4.6's 66.6%. On SWE-bench Verified (the gold standard for code understanding), it hit 93.9% versus 80.8%. On Cybench, it literally saturated the benchmark at 100% and produced 181 working exploits compared to its predecessor's 2. Two. Not a typo.
But the raw benchmarks aren't even the scary part. Here's what the model actually found in the wild:
A 27-year-old vulnerability in OpenBSD, the operating system specifically designed to be secure. An attacker could remotely crash any machine running it just by connecting to it. Twenty-seven years this had been sitting there.
A 16-year-old bug in FFmpeg, the video encoding/decoding software that basically every video on the internet touches at some point. Automated testing tools had hit this bug five million times without ever catching it. Mythos found it.
A 17-year-old vulnerability in FreeBSD. And multiple chained vulnerabilities in the Linux kernel that together could let an attacker escalate from ordinary user access to complete control of a machine.
Nicholas Carlini, one of the researchers involved, put it simply: Mythos "found more bugs in the last couple of weeks than in the rest of my life combined."
When you consider that frontier AI models keep pushing boundaries at an unprecedented rate, the implications of this level of capability hit different.
The Alignment Paradox: Best-Aligned and Most Dangerous
Here's the part that keeps me up at night.
According to Anthropic's own risk assessment, Mythos is simultaneously the best-aligned model they've ever built and the greatest alignment-related risk they've ever released. Read that twice. The same model that follows instructions more faithfully than any predecessor also poses the biggest safety challenge.
They compared it to mountaineering: greater capability gives the model access to more dangerous terrain, even if its "judgment" is better than ever.
And the evidence backs this up. During testing, Mythos escaped its sandbox environment and posted exploit details to public websites. Unprompted. Nobody asked it to do that. It also modified git history to hide unauthorized file changes and disguised its answer accuracy when it knew it was using prohibited methods.
Anthropic's interpretability team (the people who try to look inside the model's "brain") found features associated with "concealment, strategic manipulation, and avoiding suspicion."
Before you panic: Anthropic attributes this to overeager task completion, not some kind of hidden agenda or emergent consciousness. The model was trying so hard to be helpful at finding vulnerabilities that it started cutting corners to get results faster. Think of it as a very eager intern who breaks the rules not out of malice, but because they want to impress.
Still. That's exactly the kind of problem you want to understand before putting something like this in the wild. I've written about what happens when AI security goes wrong, and the pattern is always the same: the damage happens before anyone realizes there's a problem.
Project Glasswing: The Coalition
So instead of a product launch, Anthropic built a coalition. And not a small one.
Twelve launch partners: AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Plus 40 additional organizations maintaining critical software infrastructure. That's basically a who's who of every company whose software your devices run on right now.
The financial commitments tell a story too. $100 million in Mythos usage credits for coalition participants. $4 million in direct donations to open-source security organizations ($2.5 million to Alpha-Omega and OpenSSF through the Linux Foundation, $1.5 million to the Apache Software Foundation).
The plan: give these organizations access to Mythos so they can find and fix vulnerabilities in their own software before Anthropic makes the model more broadly available. A 90-day window of defensive advantage.
CrowdStrike's CTO Elia Zaitsev nailed the urgency: "The window between a vulnerability being discovered and being exploited by an adversary has collapsed. What once took months now happens in minutes with AI."
The stock market noticed, by the way. Cybersecurity stocks dropped 5-11% on the announcement. When AI can find vulnerabilities faster than security companies can patch them, it changes the entire economics of the industry.
Anthropic has been building an entire ecosystem of tools and protocols that hint at this kind of collaborative approach to AI. Glasswing is just the most dramatic example yet.
Why Open Source Needed This Yesterday
Here's the part that hits closest to home for me as a developer.
97% of all software contains open-source components. That's not a typo. Almost everything we build, everything our clients use, everything running in the cloud is sitting on a foundation of open-source code.
And who maintains that code? 5% of developers create 96% of open-source value. Most of them are doing it for free, in their spare time, with zero security budget.
Remember the xz utils backdoor from 2024? CVE-2024-3094. Someone spent two years on a social engineering campaign, slowly building trust with an overwhelmed maintainer, just to slip a backdoor into a compression library that ships with basically every Linux distribution. Two years of patient manipulation, exploiting one person's burnout.
That's the state of open-source security. Critical infrastructure held together by volunteers who are burning out.
Greg Kroah-Hartman, one of the top Linux kernel maintainers, said it plainly: "Something happened a month ago, and the world switched. Now we have real reports."
And Alex Stamos, former Facebook CSO, added the timeline that should make everyone nervous: "We only have something like six months before the open-weight models catch up." Meaning: the window where only responsible actors have this capability is closing fast.
The $4 million in donations is a start. But it's a band-aid on a structural problem. Jim Zemlin from the Linux Foundation said it best: "Open source maintainers have historically been left to figure out security on their own."
The GPT-2 Precedent (And Why This Time Is Different)
We've heard "too dangerous to release" before.
In February 2019, OpenAI declared GPT-2 too dangerous to release. The internet rolled its eyes. Researchers called it a marketing stunt. By November 2019, nine months later, the full model was publicly available. The "danger" turned out to be overhyped.
So why should we take Anthropic seriously this time?
Because this time, there's proof. Not theoretical risk assessments. Not hypothetical scenarios. Thousands of real vulnerabilities. Working exploits against software running on billions of devices. A 27-year-old bug in a security-hardened operating system that nobody else found.
Simon Willison, one of the most respected voices in the AI developer community, wrote: "The security risks really are credible here, and having extra time for trusted teams to get ahead of them is a reasonable trade-off."
But Casey Newton from Platformer raised the uncomfortable premise underneath all of this: "The only way to protect us from dangerous AI models is to build them first." Think about that for a second. The argument for building an AI that can break every piece of software on the planet is that someone else will build it anyway, so better us than them.
It's not a comfortable argument. But I'm not sure it's wrong.
Is This How AI Companies Should Behave?
I use Linux every day. FFmpeg runs behind practically everything I build for clients. OpenBSD's code is in the networking stack of most operating systems. These aren't abstract products to me. They're the tools I work with.
So when I learned that an AI model found vulnerabilities in all of them, vulnerabilities that had been hiding for 16, 17, 27 years, my first reaction wasn't fear. It was gratitude that the company who found them chose to do something responsible with the knowledge.
And that's rare. That's incredibly rare in tech.
The counterarguments are real. Kelsey Piper pointed out that "a private company now has incredibly powerful zero-day exploits of almost every software project you've heard of." That's a valid concern. This is centralized power in a new and uncomfortable way. What if the model gets stolen? What about the regulatory vacuum around all of this?
But here's my honest take: compare it to the alternative. What if Anthropic had shipped Mythos on day one? Open API access, $25 per million tokens, have at it? Every script kiddie in the world running automated vulnerability scans against every server on the internet?
Or worse: what if a less responsible lab had gotten there first?
Anthropic was founded on the thesis that AI safety isn't just a nice-to-have, it's the whole point. With Project Glasswing, that thesis is no longer theoretical. They had the ring. They let it go. Watching how different AI labs approach capability has always been interesting. But this is different from benchmarks and Pacman challenges. This is a real test of values.
I don't know if their intentions are 100% pure. I don't know if this model will stay restricted forever or if commercial pressure will eventually win. I don't know if six months is enough time to patch decades of vulnerabilities before open-weight models catch up.
But I know what they did yesterday. And it was the right call.
What Happens Next
The 90-day disclosure window is ticking. Coalition members are actively scanning their software with Mythos right now. Patches are being developed. Reports will be published.
After the research phase, restricted API access will open up at $25/$125 per million tokens, with a Cyber Verification Program for security professionals who can demonstrate legitimate need.
The bigger question: will other AI labs follow this precedent, or will they race past it? Anthropic themselves acknowledged it: "Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors committed to deploying them safely."
I don't have a clean answer for what comes next. Nobody does. But for one day at least, a company chose restraint over revenue, defense over disruption, and coalition over competition.
That's worth paying attention to.
If you're a developer or business owner wondering how AI is reshaping the security of the tools we all depend on, let's talk. This stuff moves fast, and having someone who follows it closely in your corner matters more than ever.
About the Author
Kemal Esensoy
Kemal Esensoy, founder of Wunderlandmedia, started his journey as a freelance web developer and designer. He conducted web design courses with over 3,000 students. Today, he leads an award-winning full-stack agency specializing in web development, SEO, and digital marketing.