Claude Opus 4.8, released on May 28, 2026, has moved to the top of the Artificial Analysis Intelligence Index, the most widely cited third-party aggregator of frontier AI performance, with a score of 61.4. OpenAI's GPT-5.5 sits at 60.2. Google's Gemini 3.1 Pro trails at 57. xAI's Grok 4.3 clocks in at 53. The gap between Claude and its nearest competitor is narrow on the index, but it is consistent across task categories in a way that previous Claude generations were not.

A Lead That Extends Across Task Types

The most striking number is on coding. SWE-bench Verified, the industry's standard test for autonomous software engineering, shows Claude Opus 4.8 at 88.6%. GPT-5.5 scores 58.6%. Gemini 3.1 Pro sits at 54.2%. That is not a rounding difference. It reflects a structural advantage in the ability to read existing codebases, diagnose faults, and write corrective patches without human intervention. Tom's Guide, which runs its own head-to-head evaluations for general-purpose tasks, concluded after the Opus 4.8 release that "Google Gemini's dominance is over," a verdict that would have seemed excessive twelve months ago when Gemini Ultra held the top position across most benchmarks.

The coding margin matters commercially because Claude Code's dynamic workflows allow it to orchestrate up to 1,000 parallel subagents on large engineering tasks. A model that scores 88% at the single-agent level gains compound advantages when scaled across those parallel sessions. Enterprise buyers running code migration projects across hundreds of thousands of lines of code are making model selection decisions partly on this basis, and Claude's margin over competitors has grown at exactly the moment that agentic coding has become a standard enterprise procurement category rather than an experiment.

Key Benchmark Comparisons

  • Artificial Analysis Intelligence Index: Claude 4.861.4
  • Artificial Analysis Intelligence Index: GPT-5.560.2
  • Artificial Analysis Intelligence Index: Gemini 3.1 Pro57.0
  • SWE-bench Verified (coding): Claude 4.888.6%
  • SWE-bench Verified (coding): GPT-5.558.6%
  • SWE-bench Verified (coding): Gemini 3.1 Pro54.2%

What Changed Under the Hood

Opus 4.8 introduced adaptive thinking, a mechanism that triggers extended reasoning only when the difficulty of a turn warrants it. This differs from earlier approaches that allocated a fixed token budget for thinking on every request. The practical effect is that inference costs fall on routine tasks while complex multi-step problems still receive full reasoning depth. Anthropic says the model is approximately four times less likely than Opus 4.7 to let flaws in its own code pass without flagging them, a property measured on internal evaluation suites where the model is asked to review code it has just written.

The context window is 1 million tokens by default, with up to 128,000 tokens of output per turn. Mid-conversation system messages, which allow operators to update instructions partway through a long session without breaking prompt cache hits, are now available without a beta header. These are developer-facing features, not benchmark numbers, but they compound the practical advantage for enterprise deployments that run multi-hour agentic sessions. The full Opus 4.8 release notes detail the performance improvements and migration guidance for teams moving from earlier versions.

"Google Gemini's dominance is over. Anthropic's new Claude is now the best AI for real work." Tom's Guide, May 2026 model evaluation

Enterprise Market Share and What the Benchmarks Do Not Show

Benchmark scores are a useful proxy but an incomplete one. The AI market in mid-2026 is divided along two axes that do not always point in the same direction: consumer reach and enterprise depth. On the consumer side, ChatGPT holds roughly 64.5% of the global AI chatbot user base. Claude sits at around 4.5% of consumer users. That gap reflects ChatGPT's earlier launch, its broader platform distribution, and its integration with Microsoft's consumer products.

Enterprise tells a different story. According to survey data from Menlo Ventures, Claude holds approximately 29% of the enterprise large language model market and more than half of the AI coding segment specifically. Eight of the Fortune 10 have active Claude deployments. That concentration in high-value enterprise accounts, rather than consumer volume, is the metric Anthropic optimizes for, and it is the metric that flows most directly into the $47 billion annualized revenue figure the company has disclosed ahead of its IPO filing.

The benchmark lead over GPT-5.5 and Gemini is narrow at the top-line index level. At the task level most relevant to the customers Anthropic actually serves, coding, agentic multi-step workflows, and long-context document analysis, Claude's advantage in enterprise adoption is measurably larger. Whether that translates into durable pricing power once OpenAI and Google respond with their own next model releases is the open question for the second half of 2026.

Further reading: Learn more about Claude's model family, read our background on Anthropic, or browse the latest Claude AI news.