Anthropic Releases Claude Opus 4 and Claude Sonnet 4
Summary
Anthropic released Claude Opus 4 and Claude Sonnet 4, with Opus 4 claiming the top position on major coding and reasoning benchmarks. The models emphasized agentic capabilities — the ability to work autonomously on extended tasks — positioning Anthropic at the forefront of the shift from AI assistants to AI agents.
What Happened
On May 22, 2025, Anthropic released the Claude 4 model family, led by Claude Opus 4 — its most capable model. Opus 4 achieved state-of-the-art results on SWE-bench, demonstrating exceptional ability to autonomously resolve real software engineering issues. Both models featured extended thinking capabilities, building on the hybrid reasoning approach introduced in Claude 3.7 Sonnet.
The models were designed with particular emphasis on agentic workflows — sustained, autonomous operation across multi-step tasks. Anthropic highlighted capabilities for long-running coding tasks, complex research, and multi-tool orchestration. The Claude Code tool, which had launched alongside Claude 3.7 Sonnet, was updated to take full advantage of the new models.
Claude Sonnet 4 replaced Claude 3.7 Sonnet as the default model for most API usage, offering improved performance at competitive pricing.
Why It Matters
Claude Opus 4 represented Anthropic's strongest claim yet to frontier model leadership, particularly in the agentic capabilities that were increasingly seen as the next major frontier in AI applications. The shift from "AI as conversational partner" to "AI as autonomous agent" had significant implications for how AI would be deployed in practice — from asking questions to delegating tasks.
The model's emphasis on sustained autonomous work also raised the stakes for AI safety in a new way. An AI that operates autonomously for extended periods, making decisions and taking actions without constant human oversight, requires a different safety framework than one that responds to individual queries. Anthropic's reputation as a safety-focused lab gave its agentic releases particular credibility, but also subjected them to greater scrutiny about whether safety had kept pace with capability.