When Anthropic launched Claude Fable 5 on June 9, the company led its benchmark presentation with a case study from Stripe. The payments company had handed the model a production Ruby codebase running to 50 million lines and asked it to complete a codebase-wide migration. A team doing the work manually would have needed more than two months. The model finished in a day.
That result, more than any benchmark number, is what drew attention from engineering teams across the industry. The migration task sits at the center of a type of work that has resisted automation for a long time: it requires understanding the full codebase, tracing every dependency, generating working replacement code, and doing it at a scale where the chance of missing something is very high. The fact that Fable 5 completed it without human intervention, and in a commercially practical time window, resets what enterprise software teams can reasonably expect from AI assistance.
The Fable 5 Launch
Claude Fable 5 is the first publicly available model from Anthropic's Mythos class, the same architecture that powered the security research in Project Glasswing. The model Anthropic is shipping to the public is not identical to the research version: queries touching cybersecurity, biology, chemistry, and distillation fall back to Claude Opus 4.8 rather than receiving a Fable 5 response. Anthropic says these fallbacks trigger in fewer than 5% of sessions on average, meaning the vast majority of interactions run fully on the new model.
On SWE-Bench Pro, the standard benchmark for agentic software engineering, Fable 5 scores 80.3%. The next-best model sits 11 percentage points behind. On SWE-Bench Verified, the score rises to 93.9%. On Terminal-Bench 2.1, which measures sustained command-line autonomy, Fable 5 hits 88.0%. On Humanity's Last Exam, Anthropic's broad general-knowledge gauntlet, the model reaches 59.0%.
Key Facts
- Stripe codebase size50 million lines of Ruby
- Time for manual migration2+ months (full team)
- Time with Fable 51 day
- SWE-Bench Pro score80.3% (next-best: 69.3%)
- FrontierCode Diamond29.3% vs 13.4% (Opus 4.8)
- Pricing$10/M input, $50/M output
What the Stripe Test Reveals
The Stripe case study is significant because of what it is not. It is not a synthetic benchmark designed to favor Fable 5. It is a production codebase, with real dependencies, real idiosyncrasies, and real consequences if the output is wrong. Stripe's estimate of two-plus months was presumably based on the complexity of their actual system, not a conservative guess.
The practical implication for engineering organizations is not that software teams disappear. It is that the bottleneck in large-scale code changes shifts. A team moving a 50-million-line codebase no longer spends most of its time writing the migration. The time goes into validation: reviewing the model's output, testing against live environments, deciding what to ship and in what order. The ratio of implementation time to review time inverts.
Fable 5's performance on Cognition's FrontierCode Diamond benchmark provides useful context. That test measures the quality and maintainability of agentic code, not just whether a task completes. Fable 5 scores 29.3%, compared to 13.4% for Claude Opus 4.8 and 5.7% for GPT-5.5. The gap suggests the model is not just producing more code, but producing code that holds together under scrutiny. That distinction matters when the output needs to pass a production code review, not just a functional test.
"Fable 5 is state-of-the-art on nearly all tested benchmarks and delivers exceptional performance in software engineering, knowledge work, and vision, built for ambitious, long-running work." Anthropic, Claude Fable 5 launch announcement, June 9, 2026
Pricing and Access
Anthropic is pricing Fable 5 at $10 per million input tokens and $50 per million output tokens. That is roughly half the price that Glasswing partners paid for Mythos Preview access, making it substantially cheaper to test against production workloads. Through June 22, Fable 5 is included at no extra cost on Pro, Max, Team, and seat-based Enterprise plans. On June 23, it moves behind a usage-credits gate on those subscription tiers until Anthropic says capacity allows it to return as a standard feature.
On the API and consumption-based enterprise plans, Fable 5 is fully available from launch without a free-access window. Organizations that want to build agentic pipelines around the model can start immediately. The pricing structure positions it as a serious tool for enterprise engineering workflows rather than a general-purpose chat upgrade.
The model is also available on Amazon Bedrock from day one, in the US East (N. Virginia) and Europe (Stockholm) regions. AWS customers must opt into Anthropic's data-sharing requirement before invoking Fable 5 or Mythos 5, a policy that has drawn its own attention in compliance teams. Read more about the data retention implications for enterprise ZDR agreements.
A New Baseline for Agentic Coding
The Stripe result lands in the middle of a broader shift. Claude Code, Anthropic's coding agent, already handled a meaningful fraction of Anthropic's own production code before this launch. Fable 5 raises the capability floor for what that work can include. Large-scale migrations, multi-file refactors, dependency upgrades across monorepos: these are precisely the categories that enterprise engineering teams have avoided automating because the failure modes were too unpredictable.
The practical question for most teams is no longer whether AI can assist with large-scale code changes. It is how to build the review and validation infrastructure around an AI that can produce months of work in a day. The tools for that infrastructure are still catching up. But the capability gap that made the question feel distant has, in the space of a single product launch, closed considerably.