Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #7
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,483
- Percentile
- 98.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #9. Votes: 12963. Organization: anthropic. License: Proprietary.
98.2% percentile inside its fair comparison set1,483Raw benchmark valueCI 1,477 - 1,490
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #6
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,473
- Percentile
- 98.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing. Source rank: #8. Votes: 2314. Organization: anthropic. License: Proprietary.
98.5% percentile inside its fair comparison set1,473Raw benchmark valueCI 1,461 - 1,486
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,489
- Percentile
- 98.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #7. Votes: 6174. Organization: anthropic. License: Proprietary.
98.8% percentile inside its fair comparison set1,489Raw benchmark valueCI 1,481 - 1,498
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #7
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,496
- Percentile
- 98.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: exclude_ties. Source rank: #9. Votes: 9685. Organization: anthropic. License: Proprietary.
98.2% percentile inside its fair comparison set1,496Raw benchmark valueCI 1,488 - 1,504
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #4
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,514
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts. Source rank: #6. Votes: 8404. Organization: anthropic. License: Proprietary.
99.1% percentile inside its fair comparison set1,514Raw benchmark valueCI 1,506 - 1,521
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,511
- Percentile
- 98.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts_english. Source rank: #7. Votes: 4233. Organization: anthropic. License: Proprietary.
98.8% percentile inside its fair comparison set1,511Raw benchmark valueCI 1,501 - 1,521
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #4
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,498
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: instruction_following. Source rank: #5. Votes: 4258. Organization: anthropic. License: Proprietary.
99.1% percentile inside its fair comparison set1,498Raw benchmark valueCI 1,488 - 1,508
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,506
- Percentile
- 98.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: longer_query. Source rank: #7. Votes: 5625. Organization: anthropic. License: Proprietary.
98.7% percentile inside its fair comparison set1,506Raw benchmark valueCI 1,497 - 1,515
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #4
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,508
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: multi_turn. Source rank: #6. Votes: 2258. Organization: anthropic. License: Proprietary.
99.1% percentile inside its fair comparison set1,508Raw benchmark valueCI 1,495 - 1,522
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #15
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,462
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: overall. Source rank: #19. Votes: 12963. Organization: anthropic. License: Proprietary.
95.7% percentile inside its fair comparison set1,462Raw benchmark valueCI 1,456 - 1,468
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #11
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,456
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: creative_writing. Source rank: #13. Votes: 2314. Organization: anthropic. License: Proprietary.
96.9% percentile inside its fair comparison set1,456Raw benchmark valueCI 1,443 - 1,469
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #16
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 95.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: english. Source rank: #19. Votes: 6174. Organization: anthropic. License: Proprietary.
95.4% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,458 - 1,475
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #15
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,464
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: exclude_ties. Source rank: #19. Votes: 9685. Organization: anthropic. License: Proprietary.
95.7% percentile inside its fair comparison set1,464Raw benchmark valueCI 1,456 - 1,473
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #12
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,481
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts. Source rank: #15. Votes: 8404. Organization: anthropic. License: Proprietary.
96.6% percentile inside its fair comparison set1,481Raw benchmark valueCI 1,474 - 1,489
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #15
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,479
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: hard_prompts_english. Source rank: #18. Votes: 4233. Organization: anthropic. License: Proprietary.
95.7% percentile inside its fair comparison set1,479Raw benchmark valueCI 1,470 - 1,489
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #4
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: instruction_following. Source rank: #6. Votes: 4258. Organization: anthropic. License: Proprietary.
99.1% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,473 - 1,492
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,485
- Percentile
- 98.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: longer_query. Source rank: #8. Votes: 5625. Organization: anthropic. License: Proprietary.
98.7% percentile inside its fair comparison set1,485Raw benchmark valueCI 1,476 - 1,493
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,486
- Percentile
- 98.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `claude-opus-4-8-thinking`. Category: multi_turn. Source rank: #7. Votes: 2258. Organization: anthropic. License: Proprietary.
98.8% percentile inside its fair comparison set1,486Raw benchmark valueCI 1,473 - 1,499