Back to Blogs & News

Claude Opus 4.7 vs 4.6: What Actually Changed?

12 min read
Anthropic released Claude Opus 4.7 on April 16, 2026 just 70 days after Opus 4.6 shipped on February 5. Both models carry the same $5/$25 per million token pricing. Both are positioned as the company's most capable generally available model for complex reasoning and agentic coding. So what actually changed, and does it matter for your production workloads?

Anthropic released Claude Opus 4.7 on April 16, 2026, just 70 days after Opus 4.6 shipped on February 5. Both models carry the same $5/$25 per million token pricing. Both are positioned as the company's most capable generally available model for complex reasoning and agentic coding. So what actually changed, and does it matter for your production workloads?

This is a technical breakdown of the differences between Opus 4.6 and Opus 4.7, with benchmark data, use case recommendations, and migration guidance for teams running either model in production.

👉 Try both models on Qubrid AI: https://platform.qubrid.com/models

Release Context: Why Two Opus Models in 70 Days?

Opus 4.6 launched on February 5, 2026, as a major upgrade to Opus 4.5. It introduced the 1M token context window for Opus-class models, adaptive thinking, effort controls, and state-of-the-art performance on agentic coding benchmarks like Terminal-Bench 2.0. The model was positioned for sustained, long-running work, particularly in coding, finance, and legal workflows.

Opus 4.7 arrived 70 days later, on April 16, 2026. Anthropic positioned it as a "notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks." But the release also came with a specific context: Opus 4.7 is the first model in Anthropic's Project Glasswing framework, designed to test new cybersecurity safeguards before deploying Mythos-class capabilities more broadly.

The timing matters. Between the two releases, users reported that Opus 4.6 had "quietly gotten worse" complaints centered on regression in complex engineering tasks. Anthropic denied deliberately scaling back the model to redirect compute to other projects, but the perception created pressure for a clear upgrade.

High-Level Positioning

Opus 4.6 was marketed as:

  • State-of-the-art on agentic coding (Terminal-Bench 2.0)

  • Best-in-class on economically valuable knowledge work (GDPval-AA)

  • First Opus model with 1M token context window

  • Stronger long-context retrieval and reasoning than Opus 4.5

👉 Try Claude Opus 4.6 on the Qubrid AI platform: https://platform.qubrid.com/playground?model=anthropic-claude-opus-4-6

Opus 4.7 is marketed as:

  • The most capable Opus for the hardest coding work

  • Substantially better multimodal vision (3.75MP vs 1.15MP)

  • More precise instructions following

  • Better self-verification before reporting results

👉 Try Claude Opus 4.7 on the Qubrid AI platform: https://platform.qubrid.com/playground?model=anthropic-claude-opus-4-7

Both models target autonomous, long-running agentic tasks. The difference is in execution consistency, visual fidelity, and instruction adherence on the hardest 20% of tasks.

Core Technical Differences

1. Multimodal Vision: 3× Resolution Increase

Opus 4.6:

  • Maximum image resolution: 1,568 pixels / 1.15 megapixels

  • Standard vision capabilities for document understanding, screenshots, diagrams

Opus 4.7:

  • Maximum image resolution: 2,576 pixels / 3.75 megapixels

  • 3.26× increase in total pixel capacity

  • Model-level change (not an API parameter) all images processed at higher fidelity automatically

What this unlocks:

XBOW, which builds autonomous penetration testing tools, reported a jump from 54.5% to 98.5% on its visual acuity benchmark when moving from Opus 4.6 to Opus 4.7. This is not a small improvement; it's the difference between a model that can't reliably read dense UI screenshots and one that can.

For computer-use agents, life sciences workflows (reading chemical structures, patent diagrams), and any task requiring pixel-level precision, this is the single biggest differentiator between the two models.

2. Tokenizer Update: 1.0–1.35× Token Increase

Opus 4.7 uses a new tokenizer that produces 1.0–1.35× more tokens for the same input compared to Opus 4.6 (up to ~35% more, varying by content type). This is a tradeoff for improved text processing quality.

Migration impact:

If you're running Opus 4.6 in production and migrating to Opus 4.7, you'll need to:

  • Update max_tokens parameters to give additional headroom

  • Adjust token budgets and compaction triggers

  • Monitor token usage on real traffic to measure the actual increase

Anthropic's own testing showed that token efficiency improves when measured as performance per token across all effort levels, meaning the additional tokens buy real capability gains, not just overhead.

3. Instruction Following: Literal vs Loose Interpretation

Opus 4.6:

  • Interpreted instructions loosely in some cases

  • Would skip parts of complex multi-step requests

  • Required more scaffolding and validation prompts

Opus 4.7:

  • Takes instructions literally and completely

  • Executes all parts of multi-step requests precisely

  • May produce unexpected results on prompts written for Opus 4.6

Critical migration note: Prompts optimized for Opus 4.6 may need re-tuning for Opus 4.7. Where earlier models glossed over instructions, Opus 4.7 will execute them exactly. Teams should review and adjust prompts during migration.

4. New Effort Level: xhigh

Opus 4.6 effort levels:

  • low → medium → high (default) → max

Opus 4.7 effort levels:

  • low → medium → high → xhigh → max

The new xhigh level sits between high and max, giving finer control over the reasoning depth vs latency tradeoff. In Claude Code, the default effort level was raised to xhigh for all plans.

For coding and agentic use cases, Anthropic recommends starting with high or xhigh on Opus 4.7.

5. Adaptive Thinking: Behavioral Change

Opus 4.6:

  • Adaptive thinking with configurable budget tokens: thinking = {"type": "enabled", "budget_tokens": 32000}

Opus 4.7:

  • Adaptive thinking is the only supported thinking mode

  • Syntax changed: thinking = {"type": "adaptive"} with output_config = {"effort": "high"}

  • Thinking is off by default — must be explicitly enabled

  • Thinking content omitted from response by default unless the caller opts in

This is a breaking change for teams using extended thinking on Opus 4.6.

6. Temperature, top_p, top_k: Now Unsupported

Starting with Opus 4.7, setting temperature, top_p, or top_k to any non-default value returns a 400 error. The recommended migration path is to omit these parameters entirely and use prompting to guide behavior instead.

If you were using temperature = 0 for determinism, note that it never guarantees identical outputs. Opus 4.7 removes the parameter entirely.

7. Cybersecurity Safeguards

Opus 4.6:

  • No specific cybersecurity safeguards

  • Standard safety profile

Opus 4.7:

  • First model with real-time cybersecurity safeguards

  • Requests involving prohibited or high-risk cybersecurity topics trigger automatic refusals

  • Cyber capabilities were intentionally reduced compared to Mythos Preview

  • Legitimate security work requires Cyber Verification Program approval

This is part of Anthropic's Project Glasswing framework. The safeguards are designed to block malicious use while allowing verified security professionals to use the model for defensive work.

Benchmark Comparison

Coding and Software Engineering

Benchmark

Opus 4.6

Opus 4.7

Source

CursorBench

58%

70%

Cursor

Rakuten-SWE-Bench

Baseline

3× more tasks resolved

Rakuten

Terminal-Bench 2.0

State-of-the-art

Not directly compared

Anthropic

Linear 93-task benchmark

Baseline

+13% resolution lift

Linear

Factory Droids task success

Baseline

+10–15% lift

Factory AI

Key insight: Opus 4.7 shows the strongest gains on the hardest coding tasks, the kind that previously required close supervision. It catches logical faults during planning, before execution, and handles tool failures that would stop Opus 4.6 entirely.

Multimodal and Vision

Metric

Opus 4.6

Opus 4.7

Improvement

Max image resolution

1,568px / 1.15MP

2,576px / 3.75MP

3.26×

XBOW visual acuity

54.5%

98.5%

+44pp / 1.81×

SWE-bench Multimodal

Baseline

Meaningful improvement

Anthropic

Key insight: The resolution jump is not incremental it's transformative for computer-use agents, life sciences workflows, and any task requiring pixel-level visual precision.

Document Reasoning and Knowledge Work

Benchmark

Opus 4.6

Opus 4.7

Source

BigLaw Bench

90.2%

90.9%

Harvey

Databricks OfficeQA Pro errors

Baseline

21% fewer errors

Databricks

GDPval-AA

State-of-the-art

State-of-the-art

Anthropic

Finance Agent

State-of-the-art

State-of-the-art

Anthropic

Key insight: Both models are best-in-class for knowledge work. Opus 4.7 shows marginal improvements on document reasoning, but the gap is smaller than in coding or vision tasks.

Long-Context Performance

Metric

Opus 4.6

Opus 4.7

Notes

Context window

1M tokens (beta)

1M tokens (standard)

No long-context premium on 4.7

Max output tokens

128k

128k

Same

MRCR v2 (8-needle, 1M)

76%

Not reported

4.6 had strong long-context retrieval

Key insight: Opus 4.6 introduced the 1M token window for Opus-class models and showed strong performance on long-context retrieval. Opus 4.7 maintains this capability and removes the long-context premium (previously $10/$37.50 per million tokens above 200k).

What the Early Testers Said

On Opus 4.6

Notion: "The strongest model Anthropic has shipped. Takes complicated requests and actually follows through."

GitHub: "Delivering on the complex, multi-step coding work developers face every day."

Cursor: "The new frontier on long-running tasks from our internal benchmarks."

Harvey: "Achieved the highest BigLaw Bench score of any Claude model at 90.2%."

👉 Try Claude Opus 4.6 on the Qubrid AI platform: https://platform.qubrid.com/playground?model=anthropic-claude-opus-4-6

On Opus 4.7

Linear: "13% lift in resolution rate on our 93-task coding benchmark, including four tasks neither Opus 4.6 nor Sonnet 4.6 could solve."

Cursor: "70% resolution rate on CursorBench versus 58% for Opus 4.6."

XBOW: "Visual acuity went from 54.5% to 98.5% our single biggest pain point with Opus 4.6 effectively disappeared."

Notion: "The first model to pass our implicit-need tests it understands what you need, not just what you wrote."

👉 Try Claude Opus 4.7 on the Qubrid AI platform: https://platform.qubrid.com/playground?model=anthropic-claude-opus-4-7

Use Case Recommendations

When to Use Opus 4.6

  • Already in production and working well - if your workflows are stable on Opus 4.6 and you haven't hit its limits, there's no urgent reason to migrate

  • Simple coding tasks - for straightforward, single-step coding work, the performance difference is minimal

  • No vision-heavy workloads - if your use case doesn't involve images, the vision upgrade won't matter

  • Temperature/top_p dependencies - if your prompts rely on these parameters, they'll break on Opus 4.7

When to Use Opus 4.7

  • Hardest 20% of coding tasks - multi-file refactors, debugging sessions, autonomous feature implementations

  • Computer-use agents - the 3.75MP vision upgrade makes this viable for tasks where Opus 4.6 failed

  • Life sciences workflows - reading chemical structures, patent diagrams, technical figures

  • Instruction-heavy workflows - tasks where precise instruction following is critical

  • New projects - if you're starting fresh, Opus 4.7 is the default choice

Migration Checklist

If you're moving from Opus 4.6 to Opus 4.7:

1. Update API parameters:

  • Change thinking = {"type": "enabled", "budget_tokens": X} to thinking = {"type": "adaptive"}

  • Move effort to output_config = {"effort": "high"}

  • Remove temperature, top_p, top_k parameters entirely

  • Increase max_tokens by 1.35× as a starting point

2. Re-tune prompts:

  • Test prompts written for Opus 4.6 may produce different results

  • Opus 4.7 takes instructions literally; adjust for this behavioral change

  • Remove scaffolding added to force interim status messages (Opus 4.7 does this by default)

3. Monitor token usage:

  • Track actual token increase on real traffic (varies 1.0–1.35× by content type)

  • Adjust token budgets and compaction triggers accordingly

  • Use the effort parameter to control token spend if needed

4. Test visual workloads:

  • If you have vision-heavy tasks, test them early the 3.75MP upgrade may unlock new capabilities

  • Consider downsampling images before sending if higher fidelity is unnecessary (to avoid token usage spikes)

5. Review cybersecurity use cases:

  • If your workflow involves security testing, penetration testing, or vulnerability research, apply to the Cyber Verification Program

  • Non-verified cybersecurity requests may trigger automatic refusals on Opus 4.7

Tone and Style: What Changed

Opus 4.7 has a more direct, opinionated tone compared to Opus 4.6's warmer, validation-forward style. It uses fewer emojis, gives more regular progress updates during long agentic traces, and spawns fewer subagents by default (though this is steerable through prompting).

For teams that built UX around Opus 4.6's communication style, this is worth noting.

Getting Started on Qubrid AI

Both Claude Opus 4.6 and Opus 4.7 are available on Qubrid AI with no infrastructure setup required. One account, one API key, immediate access to both models for side-by-side comparison.

Step 1: Sign up at platform.qubrid.com

Step 2: Find both models in the Model Catalog and test them in the browser playground. Now, enter the prompt, for example: "Can you give me top 10 restaurants in Mumbai for Italian food"

Step 3: Generate an API key and integrate. Full documentation at docs.platform.qubrid.com

Here's a minimal Python example to compare both models on the same task:

import requests

def call_opus(model, prompt):
    response = requests.post(
        "https://api.platform.qubrid.com/v1/messages",
        headers={
            "Authorization": "Bearer YOUR_QUBRID_API_KEY",
            "Content-Type": "application/json"
        },
        json={
            "model": model,
            "max_tokens": 4096,
            "messages": [{"role": "user", "content": prompt}]
        }
    )
    return response.json()["content"][0]["text"]

# Test the same prompt on both models
prompt = "Review this codebase for race conditions and explain any you find."

opus_46_result = call_opus("claude-opus-4-6", prompt)
opus_47_result = call_opus("claude-opus-4-7", prompt)

# Compare outputs
print("Opus 4.6:", opus_46_result)
print("\nOpus 4.7:", opus_47_result)

Summary Comparison Table

Feature

Opus 4.6

Opus 4.7

Advantage

Release Date

Feb 5, 2026

Apr 16, 2026

—

Max Image Resolution

1,568px / 1.15MP

2,576px / 3.75MP

4.7 (3.26×)

Tokenizer

Standard

New (1.0–1.35× more tokens)

4.6 (fewer tokens)

Instruction Following

Loose interpretation

Literal execution

4.7

Effort Levels

low, medium, high, max

low, medium, high, xhigh, max

4.7

Adaptive Thinking

With budget tokens

Type: adaptive only

Different approaches

Temperature/top_p/top_k

Supported

Unsupported (400 error)

4.6 (if needed)

Cybersecurity Safeguards

None

Real-time blocking

4.7 (for safety)

Context Window

1M (beta, premium pricing)

1M (standard, no premium)

4.7

CursorBench

58%

70%

4.7

XBOW Visual Acuity

54.5%

98.5%

4.7

BigLaw Bench

90.2%

90.9%

Marginal

Finance Agent / GDPval-AA

State-of-the-art

State-of-the-art

Tie

Terminal-Bench 2.0

State-of-the-art

Not compared

—

Tone

Warmer, validation-forward

Direct, opinionated

Preference

Best For

Stable production workflows

Hardest coding + vision tasks

Context-dependent

Final Thoughts

Opus 4.7 is not a ground-up redesign. It's a targeted upgrade focused on three areas: advanced software engineering on the hardest tasks, high-resolution multimodal vision, and precise instruction following. For teams already running Opus 4.6 successfully, migration is optional, but for new projects or workloads hitting Opus 4.6's limits, Opus 4.7 is the clear choice.

The 3.75MP vision upgrade alone makes Opus 4.7 viable for a class of computer-use and life sciences tasks where Opus 4.6 failed. The coding improvements catching logical faults during planning, handling tool failures, and following complex multi-step instructions address the failure modes that made autonomous AI work unreliable at scale.

On Qubrid AI, you can test both models side-by-side on your hardest problems and decide which one fits your production requirements.

👉 Compare Opus 4.6 vs 4.7 on Qubrid AI: https://platform.qubrid.com/models

Back to Blogs

Related Posts

View all posts

Claude Opus 4.7: Now Available on Qubrid AI

There is a particular failure mode that shows up in production AI systems working on hard problems: the model gets partway through a complex task, loses the thread, and produces something plausible but wrong. You catch it in review, adjust the prompt, and try again. Multiply that by the hardest 20% of your engineering backlog, and you have a significant drag on development velocity.

Sharvari Raut

Sharvari Raut

8 minutes

GPT-5.4 vs Claude Opus 4.6: We Tested Both on Qubrid AI – Here's What Changed

OpenAI and Anthropic have been pushing hard in 2026, and the rivalry between GPT-5.4 and Claude Opus 4.6 is heating up fast. Both models represent the cutting edge of frontier AI, but they're built on fundamentally different philosophies. While GPT-5.4 leans into raw capability and speed, Claude Opus 4.6 prioritizes reasoning depth and production reliability.

Sharvari Raut

Sharvari Raut

8 minutes

Get the latest Qubrid AI stories in your inbox

Get more essays like this one along with GPU roadmaps and model launch recaps from Qubrid each week.

Don't let your AI control you. Control your AI the Qubrid way!

Have questions? Want to Partner with us? Looking for larger deployments or custom fine-tuning? Let's collaborate on the right setup for your workloads.

"Qubrid scaled our personalized outreach from hundreds to tens of thousands of prospects. AI-driven research and content generation doubled our campaign velocity without sacrificing quality."

Demand Generation Team

Marketing & Sales Operations