Anthropic's Claude model family spans the full spectrum of AI capability and efficiency. From the lightning-fast Haiku for high-volume tasks, to the balanced Sonnet for everyday intelligence, to the frontier-setting Opus for the most demanding reasoning and analysis. Here's everything you need to know to choose the right model.
Current Models
Benchmark Comparison
| Model | GPQA Diamond | HumanEval | MATH | SWE-bench |
|---|---|---|---|---|
| Claude Opus 4 | 94.2% | 91.3% | 88.4% | 48.9% |
| Claude Sonnet 4.5 | 81.7% | 85.6% | 79.2% | 38.4% |
| Claude Haiku 4.5 | 63.4% | 71.8% | 64.1% | 18.2% |
| GPT-4o (competitor) | 83.1% | 87.2% | 76.6% | 33.2% |
Benchmarks as of May 2026. GPQA Diamond = graduate-level scientific reasoning. HumanEval = Python code generation. MATH = competition math. SWE-bench = real software engineering tasks.
Extended Thinking
How Extended Thinking Works
Extended Thinking allows Claude Sonnet 4.5 and Opus 4 to reason through complex problems step-by-step before delivering a final answer. Rather than responding immediately, the model allocates a configurable "thinking budget" (a maximum number of tokens it can use for internal reasoning) before producing the output you see.
This mode dramatically improves performance on multi-step reasoning tasks, hard math problems, code challenges requiring planning, and ambiguous instructions where working through edge cases first produces better results. In our testing, enabling Extended Thinking on MATH benchmark tasks improved Sonnet 4.5's score by nearly 8 percentage points.
You control the thinking budget via the API. A budget of 10,000 tokens is sufficient for most tasks; complex research or architectural decisions may benefit from 32,000 or more. Thinking tokens are billed at input token rates.
Start building with Claude today
Access all models, prompt caching, and the full Anthropic API from the developer console.