Model profile · Anthropic

claude-opus-4-8-thinking

Closed weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 3.4%
Verified coverage: 3.4%
Spread: n/a
Last verified: Jun 20, 2026

document1 aliases1 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text18 benchmarks97.8%

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,483
Percentile: 98.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #9. Votes: 12963. Organization: anthropic. License: Proprietary.

98.2% percentile inside its fair comparison set

1,483Raw benchmark valueCI 1,477 - 1,490

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,473
Percentile: 98.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing. Source rank: #8. Votes: 2314. Organization: anthropic. License: Proprietary.

98.5% percentile inside its fair comparison set

1,473Raw benchmark valueCI 1,461 - 1,486

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #7. Votes: 6174. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,481 - 1,498

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,496
Percentile: 98.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: exclude_ties. Source rank: #9. Votes: 9685. Organization: anthropic. License: Proprietary.

98.2% percentile inside its fair comparison set

1,496Raw benchmark valueCI 1,488 - 1,504

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,514
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts. Source rank: #6. Votes: 8404. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,514Raw benchmark valueCI 1,506 - 1,521

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,511
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts_english. Source rank: #7. Votes: 4233. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,511Raw benchmark valueCI 1,501 - 1,521

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,498
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: instruction_following. Source rank: #5. Votes: 4258. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,498Raw benchmark valueCI 1,488 - 1,508

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,506
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: longer_query. Source rank: #7. Votes: 5625. Organization: anthropic. License: Proprietary.

98.7% percentile inside its fair comparison set

1,506Raw benchmark valueCI 1,497 - 1,515

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,508
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: multi_turn. Source rank: #6. Votes: 2258. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,508Raw benchmark valueCI 1,495 - 1,522

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,462
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #19. Votes: 12963. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,462Raw benchmark valueCI 1,456 - 1,468

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #11

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,456
Percentile: 96.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing. Source rank: #13. Votes: 2314. Organization: anthropic. License: Proprietary.

96.9% percentile inside its fair comparison set

1,456Raw benchmark valueCI 1,443 - 1,469

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #16

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #19. Votes: 6174. Organization: anthropic. License: Proprietary.

95.4% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,458 - 1,475

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,464
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: exclude_ties. Source rank: #19. Votes: 9685. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,464Raw benchmark valueCI 1,456 - 1,473

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,481
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts. Source rank: #15. Votes: 8404. Organization: anthropic. License: Proprietary.

96.6% percentile inside its fair comparison set

1,481Raw benchmark valueCI 1,474 - 1,489

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,479
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts_english. Source rank: #18. Votes: 4233. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,479Raw benchmark valueCI 1,470 - 1,489

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,482
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: instruction_following. Source rank: #6. Votes: 4258. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,482Raw benchmark valueCI 1,473 - 1,492

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,485
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: longer_query. Source rank: #8. Votes: 5625. Organization: anthropic. License: Proprietary.

98.7% percentile inside its fair comparison set

1,485Raw benchmark valueCI 1,476 - 1,493

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,486
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: multi_turn. Source rank: #7. Votes: 2258. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,486Raw benchmark valueCI 1,473 - 1,499

Coding6 benchmarks97.2%

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,565
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #3. Votes: 3005. Organization: anthropic. License: Proprietary.

97.3% percentile inside its fair comparison set

1,565Raw benchmark valueCI 1,553 - 1,577

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,565
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: webdev. Source rank: #3. Votes: 3005. Organization: anthropic. License: Proprietary.

97.3% percentile inside its fair comparison set

1,565Raw benchmark valueCI 1,553 - 1,577

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,557
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: webdev-html. Source rank: #3. Votes: 416. Organization: anthropic. License: Proprietary.

97.3% percentile inside its fair comparison set

1,557Raw benchmark valueCI 1,526 - 1,589

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,559
Percentile: 94.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: webdev-react. Source rank: #4. Votes: 2582. Organization: anthropic. License: Proprietary.

94.9% percentile inside its fair comparison set

1,559Raw benchmark valueCI 1,546 - 1,572

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,541
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: coding. Source rank: #6. Votes: 3501. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,541Raw benchmark valueCI 1,530 - 1,551

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: coding. Source rank: #13. Votes: 3501. Organization: anthropic. License: Proprietary.

97.5% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,486 - 1,507

Reasoning / math / science2 benchmarks98.2%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,496
Percentile: 98.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: math. Source rank: #8. Votes: 648. Organization: anthropic. License: Proprietary.

98.4% percentile inside its fair comparison set

1,496Raw benchmark valueCI 1,474 - 1,519

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: math. Source rank: #10. Votes: 648. Organization: anthropic. License: Proprietary.

98.1% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,466 - 1,511

Professional reasoning18 benchmarks97.6%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,521
Percentile: 97.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: expert. Source rank: #9. Votes: 1222. Organization: anthropic. License: Proprietary.

97.8% percentile inside its fair comparison set

1,521Raw benchmark valueCI 1,505 - 1,538

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_business_and_management_and_financial_operations. Source rank: #6. Votes: 2559. Organization: anthropic. License: Proprietary.

98.7% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,484 - 1,509

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,457
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_entertainment_and_sports_and_media. Source rank: #9. Votes: 2981. Organization: anthropic. License: Proprietary.

98.1% percentile inside its fair comparison set

1,457Raw benchmark valueCI 1,446 - 1,469

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,500
Percentile: 98%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_legal_and_government. Source rank: #8. Votes: 980. Organization: anthropic. License: Proprietary.

98% percentile inside its fair comparison set

1,500Raw benchmark valueCI 1,481 - 1,520

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,509
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_life_and_physical_and_social_science. Source rank: #7. Votes: 2100. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,509Raw benchmark valueCI 1,496 - 1,522

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,509
Percentile: 99%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_mathematical. Source rank: #5. Votes: 652. Organization: anthropic. License: Proprietary.

99% percentile inside its fair comparison set

1,509Raw benchmark valueCI 1,486 - 1,532

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,516
Percentile: 99.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_medicine_and_healthcare. Source rank: #4. Votes: 995. Organization: anthropic. License: Proprietary.

99.3% percentile inside its fair comparison set

1,516Raw benchmark valueCI 1,497 - 1,535

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,526
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_software_and_it_services. Source rank: #6. Votes: 5000. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,526Raw benchmark valueCI 1,517 - 1,535

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,478
Percentile: 98.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_writing_and_literature_and_language. Source rank: #8. Votes: 3326. Organization: anthropic. License: Proprietary.

98.5% percentile inside its fair comparison set

1,478Raw benchmark valueCI 1,468 - 1,489

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #11

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,495
Percentile: 96.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: expert. Source rank: #14. Votes: 1222. Organization: anthropic. License: Proprietary.

96.4% percentile inside its fair comparison set

1,495Raw benchmark valueCI 1,478 - 1,512

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,465
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_business_and_management_and_financial_operations. Source rank: #10. Votes: 2559. Organization: anthropic. License: Proprietary.

98.1% percentile inside its fair comparison set

1,465Raw benchmark valueCI 1,453 - 1,477

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #17

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 95%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_entertainment_and_sports_and_media. Source rank: #21. Votes: 2981. Organization: anthropic. License: Proprietary.

95% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,425 - 1,448

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #14

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,477
Percentile: 95.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_legal_and_government. Source rank: #16. Votes: 980. Organization: anthropic. License: Proprietary.

95.6% percentile inside its fair comparison set

1,477Raw benchmark valueCI 1,458 - 1,497

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,478
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_life_and_physical_and_social_science. Source rank: #19. Votes: 2100. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,478Raw benchmark valueCI 1,465 - 1,491

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,498
Percentile: 98.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_mathematical. Source rank: #7. Votes: 652. Organization: anthropic. License: Proprietary.

98.4% percentile inside its fair comparison set

1,498Raw benchmark valueCI 1,475 - 1,521

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #12

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,475
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_medicine_and_healthcare. Source rank: #14. Votes: 995. Organization: anthropic. License: Proprietary.

96.3% percentile inside its fair comparison set

1,475Raw benchmark valueCI 1,455 - 1,494

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #12

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_software_and_it_services. Source rank: #15. Votes: 5000. Organization: anthropic. License: Proprietary.

96.6% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,480 - 1,498

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #10

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,464
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_writing_and_literature_and_language. Source rank: #12. Votes: 3326. Organization: anthropic. License: Proprietary.

97.2% percentile inside its fair comparison set

1,464Raw benchmark valueCI 1,453 - 1,474

Vision understanding12 benchmarks92.7%

Vision Arena

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,289
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #8. Votes: 3701. Organization: anthropic. License: Proprietary.

95.4% percentile inside its fair comparison set

1,289Raw benchmark valueCI 1,277 - 1,300

Vision Arena · Creative Writing Vision

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,276
Percentile: 85.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing_vision. Source rank: #13. Votes: 188. Organization: anthropic. License: Proprietary.

85.5% percentile inside its fair comparison set

1,276Raw benchmark valueCI 1,233 - 1,319

Vision Arena · Diagram

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,315
Percentile: 92.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: diagram. Source rank: #8. Votes: 1044. Organization: anthropic. License: Proprietary.

92.9% percentile inside its fair comparison set

1,315Raw benchmark valueCI 1,296 - 1,335

Vision Arena · English

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,283
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #8. Votes: 1625. Organization: anthropic. License: Proprietary.

95.4% percentile inside its fair comparison set

1,283Raw benchmark valueCI 1,266 - 1,299

Vision Arena · Homework

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,326
Percentile: 95.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: homework. Source rank: #6. Votes: 517. Organization: anthropic. License: Proprietary.

95.6% percentile inside its fair comparison set

1,326Raw benchmark valueCI 1,299 - 1,352

Vision Arena · Ocr

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,305
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: ocr. Source rank: #6. Votes: 2692. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,305Raw benchmark valueCI 1,293 - 1,318

Vision Arena · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #8

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,297
Percentile: 93.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #10. Votes: 3701. Organization: anthropic. License: Proprietary.

93.6% percentile inside its fair comparison set

1,297Raw benchmark valueCI 1,285 - 1,308

Vision Arena · Creative Writing Vision · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,287
Percentile: 85.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing_vision. Source rank: #12. Votes: 188. Organization: anthropic. License: Proprietary.

85.5% percentile inside its fair comparison set

1,287Raw benchmark valueCI 1,244 - 1,331

Vision Arena · Diagram · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,313
Percentile: 91.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: diagram. Source rank: #10. Votes: 1044. Organization: anthropic. License: Proprietary.

91.4% percentile inside its fair comparison set

1,313Raw benchmark valueCI 1,294 - 1,332

Vision Arena · English · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,293
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #9. Votes: 1625. Organization: anthropic. License: Proprietary.

94.5% percentile inside its fair comparison set

1,293Raw benchmark valueCI 1,277 - 1,309

Vision Arena · Homework · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,330
Percentile: 94.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: homework. Source rank: #6. Votes: 517. Organization: anthropic. License: Proprietary.

94.1% percentile inside its fair comparison set

1,330Raw benchmark valueCI 1,303 - 1,356

Vision Arena · Ocr · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,310
Percentile: 92.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: ocr. Source rank: #8. Votes: 2692. Organization: anthropic. License: Proprietary.

92.9% percentile inside its fair comparison set

1,310Raw benchmark valueCI 1,297 - 1,322

Document understanding2 benchmarks81.3%

Document Arena

AR · Document understanding · Human

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,485
Percentile: 87.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #6. Votes: 3431. Organization: anthropic. License: Proprietary.

87.5% percentile inside its fair comparison set

1,485Raw benchmark valueCI 1,474 - 1,495

Document Arena · No Style Control

AR · Document understanding · Human

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,473
Percentile: 75%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #10. Votes: 3431. Organization: anthropic. License: Proprietary.

75% percentile inside its fair comparison set

1,473Raw benchmark valueCI 1,462 - 1,483

Multilingual14 benchmarks93.9%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #13

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,513
Percentile: 95.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: chinese. Source rank: #17. Votes: 587. Organization: anthropic. License: Proprietary.

95.9% percentile inside its fair comparison set

1,513Raw benchmark valueCI 1,489 - 1,538

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #1

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,545
Percentile: 100%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: french. Source rank: #1. Votes: 436. Organization: anthropic. License: Proprietary.

100% percentile inside its fair comparison set

1,545Raw benchmark valueCI 1,515 - 1,575

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #20

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,461
Percentile: 92%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: german. Source rank: #26. Votes: 216. Organization: anthropic. License: Proprietary.

92% percentile inside its fair comparison set

1,461Raw benchmark valueCI 1,423 - 1,500

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #25

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,425
Percentile: 88.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: japanese. Source rank: #35. Votes: 155. Organization: anthropic. License: Proprietary.

88.2% percentile inside its fair comparison set

1,425Raw benchmark valueCI 1,376 - 1,475

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 93.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: korean. Source rank: #18. Votes: 186. Organization: anthropic. License: Proprietary.

93.3% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,392 - 1,480

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,504
Percentile: 99%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: russian. Source rank: #5. Votes: 1421. Organization: anthropic. License: Proprietary.

99% percentile inside its fair comparison set

1,504Raw benchmark valueCI 1,488 - 1,521

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,479
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: spanish. Source rank: #8. Votes: 347. Organization: anthropic. License: Proprietary.

97.2% percentile inside its fair comparison set

1,479Raw benchmark valueCI 1,447 - 1,511

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #25

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,494
Percentile: 91.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: chinese. Source rank: #30. Votes: 587. Organization: anthropic. License: Proprietary.

91.9% percentile inside its fair comparison set

1,494Raw benchmark valueCI 1,469 - 1,518

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #1

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,522
Percentile: 100%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: french. Source rank: #1. Votes: 436. Organization: anthropic. License: Proprietary.

100% percentile inside its fair comparison set

1,522Raw benchmark valueCI 1,492 - 1,552

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #28

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 88.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: german. Source rank: #35. Votes: 216. Organization: anthropic. License: Proprietary.

88.6% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,400 - 1,478

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #26

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,406
Percentile: 87.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: japanese. Source rank: #35. Votes: 155. Organization: anthropic. License: Proprietary.

87.7% percentile inside its fair comparison set

1,406Raw benchmark valueCI 1,357 - 1,456

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #22

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 89.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: korean. Source rank: #27. Votes: 186. Organization: anthropic. License: Proprietary.

89.9% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,371 - 1,460

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,483
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: russian. Source rank: #12. Votes: 1421. Organization: anthropic. License: Proprietary.

97.2% percentile inside its fair comparison set

1,483Raw benchmark valueCI 1,466 - 1,499

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #14

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,463
Percentile: 93.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: spanish. Source rank: #16. Votes: 347. Organization: anthropic. License: Proprietary.

93.9% percentile inside its fair comparison set

1,463Raw benchmark valueCI 1,431 - 1,495

Source links and registry checks

official

Arena

Jun 20, 2026

source →

Model profile · Anthropic

claude-opus-4-8-thinking

Closed weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 3.4%
Verified coverage: 3.4%
Spread: n/a
Last verified: Jun 20, 2026

document1 aliases1 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text18 benchmarks97.8%

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,483
Percentile: 98.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #9. Votes: 12963. Organization: anthropic. License: Proprietary.

98.2% percentile inside its fair comparison set

1,483Raw benchmark valueCI 1,477 - 1,490

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,473
Percentile: 98.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing. Source rank: #8. Votes: 2314. Organization: anthropic. License: Proprietary.

98.5% percentile inside its fair comparison set

1,473Raw benchmark valueCI 1,461 - 1,486

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #7. Votes: 6174. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,481 - 1,498

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,496
Percentile: 98.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: exclude_ties. Source rank: #9. Votes: 9685. Organization: anthropic. License: Proprietary.

98.2% percentile inside its fair comparison set

1,496Raw benchmark valueCI 1,488 - 1,504

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,514
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts. Source rank: #6. Votes: 8404. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,514Raw benchmark valueCI 1,506 - 1,521

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,511
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts_english. Source rank: #7. Votes: 4233. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,511Raw benchmark valueCI 1,501 - 1,521

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,498
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: instruction_following. Source rank: #5. Votes: 4258. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,498Raw benchmark valueCI 1,488 - 1,508

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,506
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: longer_query. Source rank: #7. Votes: 5625. Organization: anthropic. License: Proprietary.

98.7% percentile inside its fair comparison set

1,506Raw benchmark valueCI 1,497 - 1,515

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,508
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: multi_turn. Source rank: #6. Votes: 2258. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,508Raw benchmark valueCI 1,495 - 1,522

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,462
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #19. Votes: 12963. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,462Raw benchmark valueCI 1,456 - 1,468

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #11

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,456
Percentile: 96.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing. Source rank: #13. Votes: 2314. Organization: anthropic. License: Proprietary.

96.9% percentile inside its fair comparison set

1,456Raw benchmark valueCI 1,443 - 1,469

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #16

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #19. Votes: 6174. Organization: anthropic. License: Proprietary.

95.4% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,458 - 1,475

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,464
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: exclude_ties. Source rank: #19. Votes: 9685. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,464Raw benchmark valueCI 1,456 - 1,473

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,481
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts. Source rank: #15. Votes: 8404. Organization: anthropic. License: Proprietary.

96.6% percentile inside its fair comparison set

1,481Raw benchmark valueCI 1,474 - 1,489

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,479
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts_english. Source rank: #18. Votes: 4233. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,479Raw benchmark valueCI 1,470 - 1,489

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,482
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: instruction_following. Source rank: #6. Votes: 4258. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,482Raw benchmark valueCI 1,473 - 1,492

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,485
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: longer_query. Source rank: #8. Votes: 5625. Organization: anthropic. License: Proprietary.

98.7% percentile inside its fair comparison set

1,485Raw benchmark valueCI 1,476 - 1,493

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,486
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: multi_turn. Source rank: #7. Votes: 2258. Organization: anthropic. License: Proprietary.

98.8% percentile inside its fair comparison set

1,486Raw benchmark valueCI 1,473 - 1,499

Coding6 benchmarks97.2%

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,565
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #3. Votes: 3005. Organization: anthropic. License: Proprietary.

97.3% percentile inside its fair comparison set

1,565Raw benchmark valueCI 1,553 - 1,577

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,565
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: webdev. Source rank: #3. Votes: 3005. Organization: anthropic. License: Proprietary.

97.3% percentile inside its fair comparison set

1,565Raw benchmark valueCI 1,553 - 1,577

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,557
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: webdev-html. Source rank: #3. Votes: 416. Organization: anthropic. License: Proprietary.

97.3% percentile inside its fair comparison set

1,557Raw benchmark valueCI 1,526 - 1,589

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,559
Percentile: 94.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: webdev-react. Source rank: #4. Votes: 2582. Organization: anthropic. License: Proprietary.

94.9% percentile inside its fair comparison set

1,559Raw benchmark valueCI 1,546 - 1,572

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,541
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: coding. Source rank: #6. Votes: 3501. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,541Raw benchmark valueCI 1,530 - 1,551

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: coding. Source rank: #13. Votes: 3501. Organization: anthropic. License: Proprietary.

97.5% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,486 - 1,507

Reasoning / math / science2 benchmarks98.2%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,496
Percentile: 98.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: math. Source rank: #8. Votes: 648. Organization: anthropic. License: Proprietary.

98.4% percentile inside its fair comparison set

1,496Raw benchmark valueCI 1,474 - 1,519

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: math. Source rank: #10. Votes: 648. Organization: anthropic. License: Proprietary.

98.1% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,466 - 1,511

Professional reasoning18 benchmarks97.6%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,521
Percentile: 97.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: expert. Source rank: #9. Votes: 1222. Organization: anthropic. License: Proprietary.

97.8% percentile inside its fair comparison set

1,521Raw benchmark valueCI 1,505 - 1,538

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

98.7% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,484 - 1,509

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,457
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

98.1% percentile inside its fair comparison set

1,457Raw benchmark valueCI 1,446 - 1,469

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,500
Percentile: 98%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_legal_and_government. Source rank: #8. Votes: 980. Organization: anthropic. License: Proprietary.

98% percentile inside its fair comparison set

1,500Raw benchmark valueCI 1,481 - 1,520

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,509
Percentile: 98.8%
Last updated: recent
Eligibility: benchmark_derived_model

98.8% percentile inside its fair comparison set

1,509Raw benchmark valueCI 1,496 - 1,522

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,509
Percentile: 99%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_mathematical. Source rank: #5. Votes: 652. Organization: anthropic. License: Proprietary.

99% percentile inside its fair comparison set

1,509Raw benchmark valueCI 1,486 - 1,532

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #3

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,516
Percentile: 99.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_medicine_and_healthcare. Source rank: #4. Votes: 995. Organization: anthropic. License: Proprietary.

99.3% percentile inside its fair comparison set

1,516Raw benchmark valueCI 1,497 - 1,535

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,526
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_software_and_it_services. Source rank: #6. Votes: 5000. Organization: anthropic. License: Proprietary.

99.1% percentile inside its fair comparison set

1,526Raw benchmark valueCI 1,517 - 1,535

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,478
Percentile: 98.5%
Last updated: recent
Eligibility: benchmark_derived_model

98.5% percentile inside its fair comparison set

1,478Raw benchmark valueCI 1,468 - 1,489

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #11

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,495
Percentile: 96.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: expert. Source rank: #14. Votes: 1222. Organization: anthropic. License: Proprietary.

96.4% percentile inside its fair comparison set

1,495Raw benchmark valueCI 1,478 - 1,512

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,465
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

98.1% percentile inside its fair comparison set

1,465Raw benchmark valueCI 1,453 - 1,477

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #17

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 95%
Last updated: recent
Eligibility: benchmark_derived_model

95% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,425 - 1,448

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #14

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,477
Percentile: 95.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_legal_and_government. Source rank: #16. Votes: 980. Organization: anthropic. License: Proprietary.

95.6% percentile inside its fair comparison set

1,477Raw benchmark valueCI 1,458 - 1,497

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,478
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

95.7% percentile inside its fair comparison set

1,478Raw benchmark valueCI 1,465 - 1,491

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,498
Percentile: 98.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_mathematical. Source rank: #7. Votes: 652. Organization: anthropic. License: Proprietary.

98.4% percentile inside its fair comparison set

1,498Raw benchmark valueCI 1,475 - 1,521

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #12

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,475
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_medicine_and_healthcare. Source rank: #14. Votes: 995. Organization: anthropic. License: Proprietary.

96.3% percentile inside its fair comparison set

1,475Raw benchmark valueCI 1,455 - 1,494

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #12

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: industry_software_and_it_services. Source rank: #15. Votes: 5000. Organization: anthropic. License: Proprietary.

96.6% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,480 - 1,498

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #10

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,464
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

97.2% percentile inside its fair comparison set

1,464Raw benchmark valueCI 1,453 - 1,474

Vision understanding12 benchmarks92.7%

Vision Arena

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,289
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #8. Votes: 3701. Organization: anthropic. License: Proprietary.

95.4% percentile inside its fair comparison set

1,289Raw benchmark valueCI 1,277 - 1,300

Vision Arena · Creative Writing Vision

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,276
Percentile: 85.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing_vision. Source rank: #13. Votes: 188. Organization: anthropic. License: Proprietary.

85.5% percentile inside its fair comparison set

1,276Raw benchmark valueCI 1,233 - 1,319

Vision Arena · Diagram

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,315
Percentile: 92.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: diagram. Source rank: #8. Votes: 1044. Organization: anthropic. License: Proprietary.

92.9% percentile inside its fair comparison set

1,315Raw benchmark valueCI 1,296 - 1,335

Vision Arena · English

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,283
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #8. Votes: 1625. Organization: anthropic. License: Proprietary.

95.4% percentile inside its fair comparison set

1,283Raw benchmark valueCI 1,266 - 1,299

Vision Arena · Homework

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,326
Percentile: 95.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: homework. Source rank: #6. Votes: 517. Organization: anthropic. License: Proprietary.

95.6% percentile inside its fair comparison set

1,326Raw benchmark valueCI 1,299 - 1,352

Vision Arena · Ocr

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,305
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: ocr. Source rank: #6. Votes: 2692. Organization: anthropic. License: Proprietary.

95.7% percentile inside its fair comparison set

1,305Raw benchmark valueCI 1,293 - 1,318

Vision Arena · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #8

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,297
Percentile: 93.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #10. Votes: 3701. Organization: anthropic. License: Proprietary.

93.6% percentile inside its fair comparison set

1,297Raw benchmark valueCI 1,285 - 1,308

Vision Arena · Creative Writing Vision · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,287
Percentile: 85.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing_vision. Source rank: #12. Votes: 188. Organization: anthropic. License: Proprietary.

85.5% percentile inside its fair comparison set

1,287Raw benchmark valueCI 1,244 - 1,331

Vision Arena · Diagram · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,313
Percentile: 91.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: diagram. Source rank: #10. Votes: 1044. Organization: anthropic. License: Proprietary.

91.4% percentile inside its fair comparison set

1,313Raw benchmark valueCI 1,294 - 1,332

Vision Arena · English · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,293
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #9. Votes: 1625. Organization: anthropic. License: Proprietary.

94.5% percentile inside its fair comparison set

1,293Raw benchmark valueCI 1,277 - 1,309

Vision Arena · Homework · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,330
Percentile: 94.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: homework. Source rank: #6. Votes: 517. Organization: anthropic. License: Proprietary.

94.1% percentile inside its fair comparison set

1,330Raw benchmark valueCI 1,303 - 1,356

Vision Arena · Ocr · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #6

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,310
Percentile: 92.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: ocr. Source rank: #8. Votes: 2692. Organization: anthropic. License: Proprietary.

92.9% percentile inside its fair comparison set

1,310Raw benchmark valueCI 1,297 - 1,322

Document understanding2 benchmarks81.3%

Document Arena

AR · Document understanding · Human

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,485
Percentile: 87.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #6. Votes: 3431. Organization: anthropic. License: Proprietary.

87.5% percentile inside its fair comparison set

1,485Raw benchmark valueCI 1,474 - 1,495

Document Arena · No Style Control

AR · Document understanding · Human

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,473
Percentile: 75%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #10. Votes: 3431. Organization: anthropic. License: Proprietary.

75% percentile inside its fair comparison set

1,473Raw benchmark valueCI 1,462 - 1,483

Multilingual14 benchmarks93.9%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #13

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,513
Percentile: 95.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: chinese. Source rank: #17. Votes: 587. Organization: anthropic. License: Proprietary.

95.9% percentile inside its fair comparison set

1,513Raw benchmark valueCI 1,489 - 1,538

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #1

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,545
Percentile: 100%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: french. Source rank: #1. Votes: 436. Organization: anthropic. License: Proprietary.

100% percentile inside its fair comparison set

1,545Raw benchmark valueCI 1,515 - 1,575

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #20

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,461
Percentile: 92%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: german. Source rank: #26. Votes: 216. Organization: anthropic. License: Proprietary.

92% percentile inside its fair comparison set

1,461Raw benchmark valueCI 1,423 - 1,500

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #25

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,425
Percentile: 88.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: japanese. Source rank: #35. Votes: 155. Organization: anthropic. License: Proprietary.

88.2% percentile inside its fair comparison set

1,425Raw benchmark valueCI 1,376 - 1,475

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #15

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 93.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: korean. Source rank: #18. Votes: 186. Organization: anthropic. License: Proprietary.

93.3% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,392 - 1,480

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #4

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,504
Percentile: 99%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: russian. Source rank: #5. Votes: 1421. Organization: anthropic. License: Proprietary.

99% percentile inside its fair comparison set

1,504Raw benchmark valueCI 1,488 - 1,521

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #7

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,479
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: spanish. Source rank: #8. Votes: 347. Organization: anthropic. License: Proprietary.

97.2% percentile inside its fair comparison set

1,479Raw benchmark valueCI 1,447 - 1,511

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #25

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,494
Percentile: 91.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: chinese. Source rank: #30. Votes: 587. Organization: anthropic. License: Proprietary.

91.9% percentile inside its fair comparison set

1,494Raw benchmark valueCI 1,469 - 1,518

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #1

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,522
Percentile: 100%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: french. Source rank: #1. Votes: 436. Organization: anthropic. License: Proprietary.

100% percentile inside its fair comparison set

1,522Raw benchmark valueCI 1,492 - 1,552

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #28

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 88.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: german. Source rank: #35. Votes: 216. Organization: anthropic. License: Proprietary.

88.6% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,400 - 1,478

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #26

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,406
Percentile: 87.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: japanese. Source rank: #35. Votes: 155. Organization: anthropic. License: Proprietary.

87.7% percentile inside its fair comparison set

1,406Raw benchmark valueCI 1,357 - 1,456

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #22

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 89.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: korean. Source rank: #27. Votes: 186. Organization: anthropic. License: Proprietary.

89.9% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,371 - 1,460

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #9

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,483
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: russian. Source rank: #12. Votes: 1421. Organization: anthropic. License: Proprietary.

97.2% percentile inside its fair comparison set

1,483Raw benchmark valueCI 1,466 - 1,499

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #14

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,463
Percentile: 93.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: spanish. Source rank: #16. Votes: 347. Organization: anthropic. License: Proprietary.

93.9% percentile inside its fair comparison set

1,463Raw benchmark valueCI 1,431 - 1,495

Source links and registry checks

official

Arena

Jun 20, 2026

source →

claude-opus-4-8-thinking

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks

Loading model evidence.

claude-opus-4-8-thinking

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks