Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: overall. Source rank: #63. Votes: 28215. Organization: deepseek. License: MIT.
85.2% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,432 - 1,441
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #47
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,409
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: creative_writing. Source rank: #61. Votes: 4492. Organization: deepseek. License: MIT.
85.8% percentile inside its fair comparison set1,409Raw benchmark valueCI 1,399 - 1,418
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: english. Source rank: #64. Votes: 13001. Organization: deepseek. License: MIT.
84.3% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,439 - 1,452
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,433
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: exclude_ties. Source rank: #63. Votes: 21534. Organization: deepseek. License: MIT.
85.2% percentile inside its fair comparison set1,433Raw benchmark valueCI 1,427 - 1,439
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,456
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: hard_prompts. Source rank: #67. Votes: 18593. Organization: deepseek. License: MIT.
84% percentile inside its fair comparison set1,456Raw benchmark valueCI 1,451 - 1,462
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: hard_prompts_english. Source rank: #68. Votes: 9017. Organization: deepseek. License: MIT.
83.3% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,452 - 1,466
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #45
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 86.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: instruction_following. Source rank: #56. Votes: 9578. Organization: deepseek. License: MIT.
86.5% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,424 - 1,439
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #46
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: longer_query. Source rank: #57. Votes: 12107. Organization: deepseek. License: MIT.
85.2% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,441 - 1,454
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 85.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: multi_turn. Source rank: #63. Votes: 5015. Organization: deepseek. License: MIT.
85.1% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,436 - 1,455
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,422
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: overall. Source rank: #72. Votes: 28215. Organization: deepseek. License: MIT.
82.2% percentile inside its fair comparison set1,422Raw benchmark valueCI 1,417 - 1,427
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,398
- Percentile
- 82.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: creative_writing. Source rank: #70. Votes: 4492. Organization: deepseek. License: MIT.
82.7% percentile inside its fair comparison set1,398Raw benchmark valueCI 1,388 - 1,407
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #69
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 79.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: english. Source rank: #81. Votes: 13001. Organization: deepseek. License: MIT.
79.1% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,424 - 1,436
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,411
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: exclude_ties. Source rank: #74. Votes: 21534. Organization: deepseek. License: MIT.
81.5% percentile inside its fair comparison set1,411Raw benchmark valueCI 1,404 - 1,417
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #60
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: hard_prompts. Source rank: #74. Votes: 18593. Organization: deepseek. License: MIT.
81.8% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,424 - 1,436
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #69
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,433
- Percentile
- 79%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: hard_prompts_english. Source rank: #84. Votes: 9017. Organization: deepseek. License: MIT.
79% percentile inside its fair comparison set1,433Raw benchmark valueCI 1,426 - 1,440
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #45
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,414
- Percentile
- 86.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: instruction_following. Source rank: #57. Votes: 9578. Organization: deepseek. License: MIT.
86.5% percentile inside its fair comparison set1,414Raw benchmark valueCI 1,407 - 1,421
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,423
- Percentile
- 82.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: longer_query. Source rank: #66. Votes: 12107. Organization: deepseek. License: MIT.
82.9% percentile inside its fair comparison set1,423Raw benchmark valueCI 1,417 - 1,430
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 84.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash-thinking`. Category: multi_turn. Source rank: #66. Votes: 5015. Organization: deepseek. License: MIT.
84.2% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,421 - 1,439