Claude Opus 4.7 Sets a New Software Engineering Record and Brings Sharper Vision

Anthropic released Claude Opus 4.7 on April 16, making the model generally available across the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The release brings three substantive advances over Opus 4.6: a new "xhigh effort" deep reasoning mode, a vision resolution upgrade that more than triples the pixel budget per input image, and benchmark scores that improve on the prior generation across software engineering, scientific reasoning, and agentic tool use, all at the same token price.

Where Opus 4.7 Raises the Bar

The software engineering improvement is the headline figure. On SWE-bench Pro, a benchmark targeting complex professional coding scenarios drawn from real repositories, Opus 4.7 scores 64.3%. That is 6.6 percentage points above GPT-5.4's 57.7% on the same test and 10.9 points above Opus 4.6's 53.4%, the largest single-generation gain Anthropic has reported on this benchmark. On SWE-bench Verified, the more widely cited coding evaluation, Opus 4.7 scores 87.6%. Scores at this level translate to agents that can handle multi-file refactors, extended debugging sessions, and professional-grade code review with far fewer handoffs to a human engineer.

Scientific reasoning, measured on GPQA Diamond, comes in at 94.2%. On MCP-Atlas, which evaluates how accurately a model invokes tools in agentic workflows, Opus 4.7 leads GPT-5.4 by 9.2 percentage points. For production deployments built on multi-step agents, that margin is significant: models that call tools incorrectly or unnecessarily add latency and error rates to every downstream step. An improvement of nine points on tool invocation compounds quickly when agents are running hundreds of steps unattended.

Claude Opus 4.7: Key Figures

SWE-bench Verified score87.6%
SWE-bench Pro score64.3% (vs. GPT-5.4's 57.7%; vs. Opus 4.6's 53.4%)
GPQA Diamond score94.2%
MCP-Atlas lead over GPT-5.4+9.2 points
Vision input resolution2,576 px / 3.75 MP (up from 1,568 px / 1.15 MP)
Pricing$5/M input, $25/M output (unchanged from 4.6)

New Reasoning Mode and the Vision Upgrade

Opus 4.7 introduces "xhigh effort," a deep reasoning mode positioned above the existing "high effort" tier. The mode is intended for problems that benefit from extended internal deliberation before the model produces an answer: complex multi-step proofs, long-horizon planning scenarios, and tasks where reasoning errors compound across many steps. Anthropic has not published latency figures for xhigh effort, but the expected trade-off is response speed for accuracy on the hardest problem types. That trade-off is appropriate for overnight agentic runs, batch scientific workflows, or any use case where correctness matters more than speed.

The vision upgrade is more directly quantifiable. Input image resolution rises from 1,568 pixels to 2,576 pixels, roughly from 1.15 megapixels to 3.75 megapixels, a jump of more than 3x in pixel count. For reading dense financial charts, interpreting detailed engineering diagrams, or analyzing high-resolution medical images, this difference is practical: where the old resolution ceiling clipped fine detail, the new one preserves it. It extends the multimodal vision capabilities Claude has been building out since early 2026 into tasks that demand higher-fidelity visual input.

"Opus 4.7 shows improvements in software engineering and complex, long-running coding tasks, as well as better vision, allowing it to see images in higher resolution." Anthropic, Claude Opus 4.7 announcement, April 2026

Availability and Pricing

Opus 4.7 is generally available on all four major AI cloud platforms as of April 16, as well as directly through the Anthropic API. Pricing holds at $5 per million input tokens and $25 per million output tokens, unchanged from Opus 4.6. That continuity follows the pattern Anthropic has maintained through its 2026 release cycle: substantive capability gains without corresponding price increases, a combination that improves the cost-per-task economics for any team running complex workloads.

The April release also continues a pace Anthropic has kept for most of the year: a major model update roughly every four to six weeks. Opus 4.6, Sonnet 4.6, and Haiku 4.5 all shipped in the first quarter, alongside tooling releases for Claude Code 2.0 and Claude Cowork. For organizations tracking model capabilities against deployment roadmaps, the cadence means planning assumptions go stale quickly. Enterprises that standardized on Opus 4.6 for complex agentic tasks should run benchmark comparisons on 4.7 before committing to multi-quarter roadmaps.

Where Opus 4.7 Fits in the Stack

Opus 4.7 sits at the top of the Claude model stack and is designed for tasks where capability justifies higher token cost: complex agentic pipelines, advanced coding agents, and scientific reasoning at the GPQA level. For production applications where the balance of capability and cost matters more, Claude Sonnet 4.6 remains the recommended option. The line between the two is practical: if a task fits Sonnet, use Sonnet; if it requires the full depth of the frontier model, Opus 4.7 is now where that depth lives.

Developers building on Claude Code will find the software-engineering improvements directly relevant: the underlying model powering agentic coding tasks is now meaningfully more capable at the real-world problem set SWE-bench measures. The Claude 4 Opus benchmark results from earlier in the year established a high baseline. Opus 4.7 raises that ceiling on the metrics that matter most for complex deployments, without changing the pricing math for teams already running at scale.

Further reading: Learn more about Claude's model family, read our background on Anthropic, or browse the latest Claude AI news.

Claude Opus 4.7 Sets a New Software Engineering Record and Brings Sharper Vision

Where Opus 4.7 Raises the Bar

Claude Opus 4.7: Key Figures

New Reasoning Mode and the Vision Upgrade

Availability and Pricing

Where Opus 4.7 Fits in the Stack

Related Stories

Claude 4 Opus Shatters Every Major AI Benchmark

Claude Code 2.0: Autonomous Software Engineering at Scale

Claude's Multimodal Vision Surpasses Human Expert Accuracy