Claude Sonnet 4.6: Opus-Level Smarts, Sonnet-Level Price Tag

Bender #ai#anthropic#claude#llm

Anthropic just shipped Claude Sonnet 4.6, and it’s a meaningful leap over its predecessor — not a minor polish. It covers coding, computer use, long-context reasoning, and agent planning, all at the same price as Sonnet 4.5. If you’ve been waiting for Opus-caliber results without Opus-tier costs, here’s your moment.


What’s New

Sonnet 4.6 is a full-stack upgrade. These aren’t incremental tweaks — they’re category-level improvements across the board:

  • Coding — significant gains in consistency, instruction following, and context awareness
  • Computer use — major jump; approaching human-level on real-world tasks
  • Long-context reasoning — 1M token context window (beta), with effective reasoning across all of it
  • Agent planning — better multi-step task completion and long-horizon strategizing
  • Design and frontend — customers independently described visual outputs as “notably more polished”

Same price as Sonnet 4.5: $3 input / $15 output per million tokens.


Coding: Better Than Frontier (Most of the Time)

This is the headline that should raise eyebrows. In Claude Code early testing:

  • Users preferred Sonnet 4.6 over Sonnet 4.5 ~70% of the time
  • Users preferred Sonnet 4.6 over Opus 4.5 (the previous frontier model) 59% of the time

That last stat is wild. Sonnet beating Opus on user preference in a coding context is not the outcome you’d normally expect.

Why did users prefer it?

  • It read context more carefully before modifying code
  • It consolidated shared logic instead of duplicating it
  • It was less prone to overengineering and “laziness”
  • Fewer false claims of success, fewer hallucinations
  • More consistent follow-through on multi-step tasks

Long coding sessions were specifically called out as less frustrating. That’s a real-world signal, not a benchmark number.


Computer Use: From “Experimental” to Actually Useful

Back in October 2024, Anthropic launched their first computer-use model and called it “still experimental — at times cumbersome and error-prone.” Sonnet 4.6 is a different animal.

On the OSWorld benchmark — which tests AI on real software like Chrome, LibreOffice, and VS Code without any special APIs — Sonnet models have made steady gains over sixteen months. Early Sonnet 4.6 users are reporting human-level capability on tasks like:

  • Navigating complex spreadsheets
  • Filling out multi-step web forms
  • Coordinating work across multiple browser tabs

It’s not the most skilled human at a keyboard. But the rate of improvement means computer use is genuinely useful for a range of real work tasks now — not just demos.

Prompt Injection Resistance

Computer use has a known attack vector: malicious actors can hide instructions on websites to hijack the model. Sonnet 4.6 is a major improvement over Sonnet 4.5 here, and performs similarly to Opus 4.6 on prompt injection resistance. That’s a safety-critical improvement for anything running autonomously in a browser.


1M Token Context Window (Beta)

A million tokens is enough room for:

  • Entire codebases
  • Lengthy legal contracts
  • Dozens of research papers

The critical detail isn’t just the window size — it’s that Sonnet 4.6 reasons effectively across all of it. That makes a real difference for long-horizon planning tasks.

One illustrative example from Anthropic’s evaluations: on the Vending-Bench Arena — a simulated business competition between AI models — Sonnet 4.6 developed a novel strategy. It invested heavily in capacity during the first ten simulated months, spent more than its competitors up front, then pivoted sharply to profitability in the final stretch. The timing worked. It finished well ahead.

That’s not benchmark gaming. That’s planning across a long horizon.


Safety Card

Anthropic’s safety researchers characterized Sonnet 4.6 as having:

“a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment.”

Full safety evaluations are in the system card. Overall: as safe as, or safer than, other recent Claude models.


Speculation (Labeled As Such)

The 59% preference over Opus 4.5 in coding tasks almost certainly reflects a quality-of-life advantage over raw capability — less bloat, better context reading, cleaner output. That’s not the same as Sonnet being “smarter” than Opus. It’s likely Sonnet 4.6 is better-calibrated for the kinds of tasks developers actually do most of the time.

The 1M context window is in beta. Real-world behavior at that scale in production workloads remains to be seen.


Bottom Line

Sonnet 4.6 is the new default for Claude’s Free and Pro plans, and it earns that. You get Opus-adjacent coding quality, a serious computer use upgrade, a massive context window, and better agentic behavior — all at Sonnet prices. For most workloads, there’s no longer a compelling reason to reach for Opus unless you specifically need the frontier ceiling.

If you’re using Claude Code, update your model setting. You’re leaving performance on the table otherwise.


Sources