Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #103
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 68.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: overall. Source rank: #125. Votes: 7944. Organization: alibaba. License: Apache 2.0.
68.6% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,389 - 1,403
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #124
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,339
- Percentile
- 61.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: creative_writing. Source rank: #149. Votes: 1033. Organization: alibaba. License: Apache 2.0.
61.9% percentile inside its fair comparison set1,339Raw benchmark valueCI 1,321 - 1,357
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #109
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,404
- Percentile
- 66.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: english. Source rank: #132. Votes: 3887. Organization: alibaba. License: Apache 2.0.
66.8% percentile inside its fair comparison set1,404Raw benchmark valueCI 1,394 - 1,413
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #106
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,373
- Percentile
- 67.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: exclude_ties. Source rank: #128. Votes: 5589. Organization: alibaba. License: Apache 2.0.
67.7% percentile inside its fair comparison set1,373Raw benchmark valueCI 1,364 - 1,383
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #98
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 70.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: hard_prompts. Source rank: #119. Votes: 4040. Organization: alibaba. License: Apache 2.0.
70.2% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,410 - 1,428
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #109
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 66.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: hard_prompts_english. Source rank: #132. Votes: 2097. Organization: alibaba. License: Apache 2.0.
66.7% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,407 - 1,432
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #105
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,384
- Percentile
- 68%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: instruction_following. Source rank: #129. Votes: 2132. Organization: alibaba. License: Apache 2.0.
68% percentile inside its fair comparison set1,384Raw benchmark valueCI 1,372 - 1,396
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #97
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,407
- Percentile
- 68.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: longer_query. Source rank: #121. Votes: 1732. Organization: alibaba. License: Apache 2.0.
68.4% percentile inside its fair comparison set1,407Raw benchmark valueCI 1,394 - 1,421
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #116
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,387
- Percentile
- 64.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: multi_turn. Source rank: #140. Votes: 1288. Organization: alibaba. License: Apache 2.0.
64.4% percentile inside its fair comparison set1,387Raw benchmark valueCI 1,371 - 1,404
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #94
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,401
- Percentile
- 71.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: overall. Source rank: #113. Votes: 7944. Organization: alibaba. License: Apache 2.0.
71.4% percentile inside its fair comparison set1,401Raw benchmark valueCI 1,394 - 1,408
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #109
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,344
- Percentile
- 66.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: creative_writing. Source rank: #133. Votes: 1033. Organization: alibaba. License: Apache 2.0.
66.6% percentile inside its fair comparison set1,344Raw benchmark valueCI 1,326 - 1,362
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #86
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,418
- Percentile
- 73.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: english. Source rank: #102. Votes: 3887. Organization: alibaba. License: Apache 2.0.
73.8% percentile inside its fair comparison set1,418Raw benchmark valueCI 1,408 - 1,427
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #94
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,380
- Percentile
- 71.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: exclude_ties. Source rank: #113. Votes: 5589. Organization: alibaba. License: Apache 2.0.
71.4% percentile inside its fair comparison set1,380Raw benchmark valueCI 1,370 - 1,389
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #91
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,408
- Percentile
- 72.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: hard_prompts. Source rank: #109. Votes: 4040. Organization: alibaba. License: Apache 2.0.
72.3% percentile inside its fair comparison set1,408Raw benchmark valueCI 1,399 - 1,417
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #90
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 72.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: hard_prompts_english. Source rank: #108. Votes: 2097. Organization: alibaba. License: Apache 2.0.
72.5% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,403 - 1,428
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #99
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,375
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: instruction_following. Source rank: #121. Votes: 2132. Organization: alibaba. License: Apache 2.0.
69.8% percentile inside its fair comparison set1,375Raw benchmark valueCI 1,362 - 1,387
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #94
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,394
- Percentile
- 69.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: longer_query. Source rank: #116. Votes: 1732. Organization: alibaba. License: Apache 2.0.
69.4% percentile inside its fair comparison set1,394Raw benchmark valueCI 1,380 - 1,407
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #101
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,389
- Percentile
- 69%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-vl-235b-a22b-thinking`. Category: multi_turn. Source rank: #122. Votes: 1288. Organization: alibaba. License: Apache 2.0.
69% percentile inside its fair comparison set1,389Raw benchmark valueCI 1,373 - 1,406