Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #26
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,458
- Percentile
- 92.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: overall. Source rank: #36. Votes: 26928. Organization: deepseek. License: MIT.
92.3% percentile inside its fair comparison set1,458Raw benchmark valueCI 1,453 - 1,463
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #25
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 92.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: creative_writing. Source rank: #34. Votes: 4315. Organization: deepseek. License: MIT.
92.6% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,432 - 1,452
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #28
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,466
- Percentile
- 91.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: english. Source rank: #38. Votes: 12380. Organization: deepseek. License: MIT.
91.7% percentile inside its fair comparison set1,466Raw benchmark valueCI 1,460 - 1,473
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #27
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 92%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: exclude_ties. Source rank: #37. Votes: 20597. Organization: deepseek. License: MIT.
92% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,454 - 1,467
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #29
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 91.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: hard_prompts. Source rank: #38. Votes: 17840. Organization: deepseek. License: MIT.
91.4% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,472 - 1,484
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #29
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,484
- Percentile
- 91.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: hard_prompts_english. Source rank: #37. Votes: 8686. Organization: deepseek. License: MIT.
91.4% percentile inside its fair comparison set1,484Raw benchmark valueCI 1,476 - 1,491
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #27
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 92%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: instruction_following. Source rank: #35. Votes: 9073. Organization: deepseek. License: MIT.
92% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,442 - 1,457
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #25
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 92.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: longer_query. Source rank: #33. Votes: 11456. Organization: deepseek. License: MIT.
92.1% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,465 - 1,479
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #38
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 88.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: multi_turn. Source rank: #50. Votes: 4729. Organization: deepseek. License: MIT.
88.5% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,448 - 1,466
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #26
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 92.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: overall. Source rank: #32. Votes: 26928. Organization: deepseek. License: MIT.
92.3% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,442 - 1,452
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #18
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 94.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: creative_writing. Source rank: #23. Votes: 4315. Organization: deepseek. License: MIT.
94.7% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,432 - 1,452
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #29
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 91.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: english. Source rank: #35. Votes: 12380. Organization: deepseek. License: MIT.
91.4% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,447 - 1,460
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #27
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,443
- Percentile
- 92%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: exclude_ties. Source rank: #34. Votes: 20597. Organization: deepseek. License: MIT.
92% percentile inside its fair comparison set1,443Raw benchmark valueCI 1,437 - 1,450
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #25
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,456
- Percentile
- 92.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: hard_prompts. Source rank: #32. Votes: 17840. Organization: deepseek. License: MIT.
92.6% percentile inside its fair comparison set1,456Raw benchmark valueCI 1,450 - 1,462
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #24
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: hard_prompts_english. Source rank: #31. Votes: 8686. Organization: deepseek. License: MIT.
92.9% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,453 - 1,467
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #28
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 91.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: instruction_following. Source rank: #35. Votes: 9073. Organization: deepseek. License: MIT.
91.7% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,429 - 1,443
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #24
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,453
- Percentile
- 92.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: longer_query. Source rank: #31. Votes: 11456. Organization: deepseek. License: MIT.
92.4% percentile inside its fair comparison set1,453Raw benchmark valueCI 1,446 - 1,459
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #37
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-pro-thinking`. Category: multi_turn. Source rank: #47. Votes: 4729. Organization: deepseek. License: MIT.
88.9% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,436 - 1,455