Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #92
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,403
- Percentile
- 72%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: overall. Source rank: #114. Votes: 38208. Organization: alibaba. License: Apache 2.0.
72% percentile inside its fair comparison set1,403Raw benchmark valueCI 1,399 - 1,408
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #95
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,368
- Percentile
- 70.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: creative_writing. Source rank: #118. Votes: 4621. Organization: alibaba. License: Apache 2.0.
70.9% percentile inside its fair comparison set1,368Raw benchmark valueCI 1,359 - 1,377
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #105
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,408
- Percentile
- 68%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: english. Source rank: #127. Votes: 18615. Organization: alibaba. License: Apache 2.0.
68% percentile inside its fair comparison set1,408Raw benchmark valueCI 1,402 - 1,413
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #93
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,386
- Percentile
- 71.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: exclude_ties. Source rank: #115. Votes: 27206. Organization: alibaba. License: Apache 2.0.
71.7% percentile inside its fair comparison set1,386Raw benchmark valueCI 1,380 - 1,392
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #94
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,421
- Percentile
- 71.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: hard_prompts. Source rank: #114. Votes: 16574. Organization: alibaba. License: Apache 2.0.
71.4% percentile inside its fair comparison set1,421Raw benchmark valueCI 1,416 - 1,427
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #108
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,421
- Percentile
- 67%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: hard_prompts_english. Source rank: #131. Votes: 8643. Organization: alibaba. License: Apache 2.0.
67% percentile inside its fair comparison set1,421Raw benchmark valueCI 1,414 - 1,428
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #106
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,381
- Percentile
- 67.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: instruction_following. Source rank: #130. Votes: 9276. Organization: alibaba. License: Apache 2.0.
67.7% percentile inside its fair comparison set1,381Raw benchmark valueCI 1,374 - 1,388
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #90
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,411
- Percentile
- 70.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: longer_query. Source rank: #111. Votes: 7142. Organization: alibaba. License: Apache 2.0.
70.7% percentile inside its fair comparison set1,411Raw benchmark valueCI 1,404 - 1,419
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #86
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,411
- Percentile
- 73.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: multi_turn. Source rank: #105. Votes: 6753. Organization: alibaba. License: Apache 2.0.
73.7% percentile inside its fair comparison set1,411Raw benchmark valueCI 1,403 - 1,418
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #100
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,394
- Percentile
- 69.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: overall. Source rank: #121. Votes: 38208. Organization: alibaba. License: Apache 2.0.
69.5% percentile inside its fair comparison set1,394Raw benchmark valueCI 1,389 - 1,398
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #95
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,358
- Percentile
- 70.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: creative_writing. Source rank: #117. Votes: 4621. Organization: alibaba. License: Apache 2.0.
70.9% percentile inside its fair comparison set1,358Raw benchmark valueCI 1,349 - 1,367
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #102
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 68.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: english. Source rank: #123. Votes: 18615. Organization: alibaba. License: Apache 2.0.
68.9% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,391 - 1,402
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #100
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,371
- Percentile
- 69.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: exclude_ties. Source rank: #120. Votes: 27206. Organization: alibaba. License: Apache 2.0.
69.5% percentile inside its fair comparison set1,371Raw benchmark valueCI 1,365 - 1,377
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #104
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,394
- Percentile
- 68.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: hard_prompts. Source rank: #126. Votes: 16574. Organization: alibaba. License: Apache 2.0.
68.3% percentile inside its fair comparison set1,394Raw benchmark valueCI 1,389 - 1,400
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #105
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,392
- Percentile
- 67.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: hard_prompts_english. Source rank: #127. Votes: 8643. Organization: alibaba. License: Apache 2.0.
67.9% percentile inside its fair comparison set1,392Raw benchmark valueCI 1,385 - 1,399
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #108
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,361
- Percentile
- 67.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: instruction_following. Source rank: #131. Votes: 9276. Organization: alibaba. License: Apache 2.0.
67.1% percentile inside its fair comparison set1,361Raw benchmark valueCI 1,355 - 1,368
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #100
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,390
- Percentile
- 67.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: longer_query. Source rank: #122. Votes: 7142. Organization: alibaba. License: Apache 2.0.
67.4% percentile inside its fair comparison set1,390Raw benchmark valueCI 1,383 - 1,398
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #92
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,399
- Percentile
- 71.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-235b-a22b-no-thinking`. Category: multi_turn. Source rank: #112. Votes: 6753. Organization: alibaba. License: Apache 2.0.
71.8% percentile inside its fair comparison set1,399Raw benchmark valueCI 1,391 - 1,406