I Stayed on Opus 4.6 While Everyone Chased 4.7 and 4.8. Then Fable 5 Showed Up.
I stayed on Opus 4.6 while everyone upgraded. Now Fable 5 is here. A practitioner's take on which Claude model is actually best for daily coding work.
Kemal Esensoy·Modified on June 11, 2026
"Did you try the new model yet? It's insane."
I've heard some version of this sentence four times this year. Opus 4.7 in April. Opus 4.8 in May. And now Fable 5, twice in the same week.
My answer, every single time, has been the same: I'm still on 4.6. And every time I say it, I get a look like I just admitted I'm still running Windows XP. But here's the thing: when your entire agency runs through Claude Code, skills, agents, hooks, the whole setup, "insane" is not a compliment. Boring is the compliment. I want boring. Boring pays my invoices.
Then Fable 5 showed up and made things complicated. Let me walk you through the whole journey, because I think the question "what's the best Claude model for coding" has a much messier answer than any benchmark chart will tell you.
Why I Never Left Opus 4.6
My Claude Code setup is not a chat window. It's around 30 custom skills, a handful of subagents, hooks, and slash commands that handle everything from SEO audits to client proposals to Directus content operations. I wrote about that whole stack in The AI Tools That Actually Run My One-Person Agency, and the short version is: Claude Code is closer to an employee than a tool at this point.
And Opus 4.6 runs all of it without surprises. Skills fire when they should. Agents stay in their lanes. Output formats stay stable from Monday to Friday. When I start a migration for a client at 9am, I know roughly what the model will do, because it did the same thing last week.
That predictability is worth more to me than five extra points on SWE-bench. Nobody benchmarks "does my hooks-and-skills setup still work on Tuesday." But that's the only benchmark that matters when a client is paying you for the result.
What Broke When I Tried 4.7 and 4.8
I gave 4.7 two days. I gave 4.8 about a week. Both times I went back.
With 4.7, the model was clearly smarter on paper but kept reinterpreting my skill instructions in creative ways. Instructions that 4.6 followed to the letter suddenly became suggestions. A content skill that always returned clean markdown started adding its own formatting opinions. Small things, but small things multiply when you run 30 of them.
4.8 was weirder. It developed what I can only describe as a personality problem: it pushed back on me as a goal in itself. The community even has a name for it now, negative sycophancy. And I wasn't imagining the agent chaos either. 4.8 shipped with dynamic workflows that could trigger just because the word "workflow" appeared in your prompt, spawning subagents I never asked for. There's a keyboard shortcut to disable it. The fact that a keyboard shortcut exists tells you everything.
Here's the kicker: Anthropic's own migration docs say Opus 4.8 "isn't a drop-in replacement" for 4.7, and that agents with strict output formats need revalidation. They're not hiding it. We just don't read the fine print because the launch benchmarks look so good. My favorite data point from the community testing: on Vending-Bench, 4.8 fell for scams 30 times more often than 4.7. Smarter and more gullible at the same time. I've already written about how Claude is great at building software and equally great at breaking it, and model upgrades are exactly the same story one level up: the thing that breaks your system is the upgrade you didn't actually need.
To be fair: plenty of developers run 4.8 happily. If your workflow is mostly interactive coding without much automation on top, you'd probably never notice any of this. My setup is the opposite of that. The more automation you stack on a model, the more every behavioral change costs you.
The Benchmark Trap
Every model release since 2024 has topped some chart. If you upgraded every time a benchmark improved, you'd have changed models eleven times in two years and spent half your working hours re-tuning prompts.
Here's the detail that convinced me the version number is the least interesting part: Anthropic's docs recommend specific effort levels per model. For most coding work on 4.7 and 4.8, the recommendation is xhigh. That means the same prompt, on the same task, needs different tuning depending on the model version. The model is not a drop-in component. It's a dependency with breaking changes, except the changelog is vibes and Reddit threads.
So my rule became simple: I don't upgrade because the new model is better. I upgrade when my current model is the bottleneck. 4.6 was never the bottleneck.
Until maybe now.
Then Anthropic Shipped Fable 5
On June 9, Anthropic released Claude Fable 5, and this one is genuinely different. It's the first publicly available Mythos-class model, the same weights as Mythos 5, just with safety classifiers for cybersecurity and bio on top. If you remember the Glasswing story about the model Anthropic considered too dangerous to ship, this is that lineage, now sitting in my terminal.
The numbers are not subtle. 80.3% on SWE-bench Pro, against GPT-5.5's 58.6%. Pricing at $10 per million input tokens and $50 per million output, exactly double Opus 4.8. Stripe reportedly ran a codebase-wide migration on a 50-million-line Ruby codebase in one day during early access, something they'd estimated at two months for a team. And it's free on Pro and Max plans until June 22, after that it needs usage credits billed at API rates.
After everything I just told you about skipping 4.7 and 4.8, I obviously tried it on day one. Consistency is for models, not for me.
A Week With Fable 5: What Genuinely Impressed Me
Yesterday I gave Fable 5 a task on a client project that I'd normally babysit: restructuring a Next.js dashboard's data layer while keeping the Directus integration intact. The kind of task where 4.6 does good work but needs me checking in every twenty minutes.
Fable 5 just... did it. No narration, no "Great question, let me explain my approach!" It explored the codebase first, found an architectural problem in the caching layer that wasn't part of the task, fixed it, and mentioned it afterwards in one sentence. Then it opened Playwright, took screenshots at three viewport sizes, and verified its own work before telling me it was done.
That self-verification habit is the thing. When I migrated three client sites from WordPress to Astro, the reason nothing broke was that I made Claude check everything. Fable 5 does the checking without being told. The longer and messier the task, the bigger its lead over 4.6 gets. On short tasks, honestly, I barely notice a difference. On four-hour autonomous runs, it's not close.
And crucially, for my setup: my skills work. It follows instructions more like 4.6 than like 4.8, which surprised me the most. After two skipped generations, this is the first model since 4.6 where I haven't felt the urge to go back within 48 hours.
So, the search for the best Claude model for coding is over? Not quite. There are two problems, and they're not small.
The Two Problems: Token Burn and Thinking Loops
First problem: Fable 5 eats tokens like nothing I've used before. On my Max plan, it burns through quota at roughly twice the rate of Opus 4.8, and 4.8 was already hungrier than 4.6. I watched a single afternoon session consume what 4.6 needs for two days of work.
I'm not alone with this. Simon Willison, who is about as level-headed as AI commentary gets, burned $110.42 in a single day on Fable 5, including 78.2 million tokens for one feature. His verdict was "something of a beast" and "slow, expensive." Both things are true at once. Part of the burn is structural: the model is built to keep enormous context and notes across long tasks, which is exactly what makes it good. I dug into what huge context actually costs in the 1 million token context window post, and Fable 5 is that tradeoff pushed to its logical end.
Second problem: thinking loops. Fable 5's extended thinking is always on. You cannot disable it, only let the model decide how much to think. Usually that's fine. Sometimes it isn't. Twice now I've watched it think itself in circles on a problem, reconsidering the same approach from slightly different angles, burning output tokens the entire time. The community has documented worse: an eight-hour autonomous run that cost around $105 and produced a confidently wrong result, because the success criteria were vague. Someone summarized it perfectly: garbage rubric plus great model equals a confidently wrong loop.
That sentence should be printed on the box. The model amplifies whatever you give it, including your sloppy task definitions.
My Decision Framework: When to Upgrade, When to Stay Put
After three model evaluations this year, I've stopped improvising. This is the checklist I actually use now:
- Never switch mid-project. A model change is a dependency change. You don't swap your database the week before a client launch either.
- Test on throwaway tasks for a full week. Internal tools, my own site, experiments. Client work stays on the proven model until the new one has earned it.
- Pin the model in agent-heavy setups. If you run skills and subagents, the auto-updated "latest" setting is a time bomb. Pick a version, pin it, change it deliberately.
- Upgrade when the old model is the bottleneck, not when the new one tops a chart. I switched away from nothing in 18 months because 4.6 never blocked me. Fable 5 is the first model that does things 4.6 simply cannot.
- Set budget guardrails before long autonomous runs. With Fable 5 specifically: clear success criteria, a token ceiling, and a hard stop. The thinking loops are rare, but at $50 per million output tokens, one bad night is real money.
The deeper insurance policy is writing better task definitions in the first place. That's the same lesson from spec-driven development: the clearer your spec, the less it matters which model executes it, and the less any model swap can hurt you.
So What's the Best Claude Model for Coding Right Now?
Honest answer: for me, today, it's still Opus 4.6 for production client work, and Fable 5 for everything where its extra capability actually changes the outcome. Long refactorings, big migrations, gnarly debugging sessions. That split won't survive long, and I know it.
Because here's what I can't ignore: Fable 5 feels like 4.6 did when I first settled on it. Instructions stick. Work gets verified. The model does what it says. If Anthropic smooths out the thinking loops, and if the economics after June 22 don't get absurd, this is my next daily driver. I'm roughly 80% there. The remaining 20% is the memory of how 4.7 felt great on day three and unbearable on day five.
Ask me again in a month. The honest version of model advice always has an expiration date, and anyone who tells you otherwise is selling a course.
I can't promise you the perfect model choice. What I can offer: 8+ years of building websites and workflows that survive tool churn, including the AI kind. Let's talk if your business is trying to figure out which AI setup actually fits, instead of chasing every release.
About the Author
Kemal Esensoy
Kemal Esensoy, founder of Wunderlandmedia, started his journey as a freelance web developer and designer. He conducted web design courses with over 3,000 students. Today, he leads an award-winning full-stack agency specializing in web development, SEO, and digital marketing.