Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #116
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,383
- Percentile
- 64.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: overall. Source rank: #141. Votes: 23728. Organization: alibaba. License: Apache 2.0.
64.6% percentile inside its fair comparison set1,383Raw benchmark valueCI 1,378 - 1,388
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #139
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,321
- Percentile
- 57.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: creative_writing. Source rank: #170. Votes: 2984. Organization: alibaba. License: Apache 2.0.
57.3% percentile inside its fair comparison set1,321Raw benchmark valueCI 1,310 - 1,332
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #119
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,393
- Percentile
- 63.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: english. Source rank: #144. Votes: 11050. Organization: alibaba. License: Apache 2.0.
63.7% percentile inside its fair comparison set1,393Raw benchmark valueCI 1,387 - 1,399
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #115
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,359
- Percentile
- 64.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: exclude_ties. Source rank: #140. Votes: 16690. Organization: alibaba. License: Apache 2.0.
64.9% percentile inside its fair comparison set1,359Raw benchmark valueCI 1,353 - 1,366
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #112
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,407
- Percentile
- 65.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: hard_prompts. Source rank: #136. Votes: 11068. Organization: alibaba. License: Apache 2.0.
65.8% percentile inside its fair comparison set1,407Raw benchmark valueCI 1,401 - 1,414
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #114
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 65.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: hard_prompts_english. Source rank: #138. Votes: 5623. Organization: alibaba. License: Apache 2.0.
65.1% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,408 - 1,424
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #120
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,367
- Percentile
- 63.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: instruction_following. Source rank: #146. Votes: 5963. Organization: alibaba. License: Apache 2.0.
63.4% percentile inside its fair comparison set1,367Raw benchmark valueCI 1,359 - 1,375
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #119
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,382
- Percentile
- 61.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: longer_query. Source rank: #144. Votes: 4880. Organization: alibaba. License: Apache 2.0.
61.2% percentile inside its fair comparison set1,382Raw benchmark valueCI 1,373 - 1,390
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #118
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,383
- Percentile
- 63.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: multi_turn. Source rank: #142. Votes: 4072. Organization: alibaba. License: Apache 2.0.
63.8% percentile inside its fair comparison set1,383Raw benchmark valueCI 1,374 - 1,392
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #103
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,384
- Percentile
- 68.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: overall. Source rank: #125. Votes: 23728. Organization: alibaba. License: Apache 2.0.
68.6% percentile inside its fair comparison set1,384Raw benchmark valueCI 1,379 - 1,389
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #129
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,319
- Percentile
- 60.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: creative_writing. Source rank: #156. Votes: 2984. Organization: alibaba. License: Apache 2.0.
60.4% percentile inside its fair comparison set1,319Raw benchmark valueCI 1,308 - 1,330
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #107
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,391
- Percentile
- 67.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: english. Source rank: #128. Votes: 11050. Organization: alibaba. License: Apache 2.0.
67.4% percentile inside its fair comparison set1,391Raw benchmark valueCI 1,385 - 1,398
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #103
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,358
- Percentile
- 68.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: exclude_ties. Source rank: #125. Votes: 16690. Organization: alibaba. License: Apache 2.0.
68.6% percentile inside its fair comparison set1,358Raw benchmark valueCI 1,351 - 1,365
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #102
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,398
- Percentile
- 68.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: hard_prompts. Source rank: #124. Votes: 11068. Organization: alibaba. License: Apache 2.0.
68.9% percentile inside its fair comparison set1,398Raw benchmark valueCI 1,392 - 1,404
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #99
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: hard_prompts_english. Source rank: #119. Votes: 5623. Organization: alibaba. License: Apache 2.0.
69.8% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,398 - 1,414
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #107
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,362
- Percentile
- 67.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: instruction_following. Source rank: #130. Votes: 5963. Organization: alibaba. License: Apache 2.0.
67.4% percentile inside its fair comparison set1,362Raw benchmark valueCI 1,354 - 1,370
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #105
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,378
- Percentile
- 65.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: longer_query. Source rank: #129. Votes: 4880. Organization: alibaba. License: Apache 2.0.
65.8% percentile inside its fair comparison set1,378Raw benchmark valueCI 1,369 - 1,387
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #108
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,381
- Percentile
- 66.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b-instruct-2507`. Category: multi_turn. Source rank: #130. Votes: 4072. Organization: alibaba. License: Apache 2.0.
66.9% percentile inside its fair comparison set1,381Raw benchmark valueCI 1,372 - 1,391