Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #170
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,327
- Percentile
- 48%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: overall. Source rank: #203. Votes: 26486. Organization: alibaba. License: Apache 2.0.
48% percentile inside its fair comparison set1,327Raw benchmark valueCI 1,323 - 1,332
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #181
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,284
- Percentile
- 44.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: creative_writing. Source rank: #215. Votes: 3364. Organization: alibaba. License: Apache 2.0.
44.3% percentile inside its fair comparison set1,284Raw benchmark valueCI 1,274 - 1,295
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #169
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,346
- Percentile
- 48.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: english. Source rank: #201. Votes: 13391. Organization: alibaba. License: Apache 2.0.
48.3% percentile inside its fair comparison set1,346Raw benchmark valueCI 1,340 - 1,352
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #170
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,276
- Percentile
- 48%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: exclude_ties. Source rank: #203. Votes: 18351. Organization: alibaba. License: Apache 2.0.
48% percentile inside its fair comparison set1,276Raw benchmark valueCI 1,269 - 1,283
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #162
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,345
- Percentile
- 50.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: hard_prompts. Source rank: #193. Votes: 10694. Organization: alibaba. License: Apache 2.0.
50.5% percentile inside its fair comparison set1,345Raw benchmark valueCI 1,339 - 1,352
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #159
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,363
- Percentile
- 51.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: hard_prompts_english. Source rank: #190. Votes: 5811. Organization: alibaba. License: Apache 2.0.
51.2% percentile inside its fair comparison set1,363Raw benchmark valueCI 1,354 - 1,371
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #174
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,312
- Percentile
- 46.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: instruction_following. Source rank: #207. Votes: 6129. Organization: alibaba. License: Apache 2.0.
46.8% percentile inside its fair comparison set1,312Raw benchmark valueCI 1,304 - 1,320
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #158
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,340
- Percentile
- 48.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: longer_query. Source rank: #189. Votes: 4458. Organization: alibaba. License: Apache 2.0.
48.4% percentile inside its fair comparison set1,340Raw benchmark valueCI 1,331 - 1,349
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #172
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,321
- Percentile
- 47.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: multi_turn. Source rank: #205. Votes: 4522. Organization: alibaba. License: Apache 2.0.
47.1% percentile inside its fair comparison set1,321Raw benchmark valueCI 1,311 - 1,330
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #162
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,317
- Percentile
- 50.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: overall. Source rank: #191. Votes: 26486. Organization: alibaba. License: Apache 2.0.
50.5% percentile inside its fair comparison set1,317Raw benchmark valueCI 1,312 - 1,322
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #169
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,272
- Percentile
- 48%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: creative_writing. Source rank: #201. Votes: 3364. Organization: alibaba. License: Apache 2.0.
48% percentile inside its fair comparison set1,272Raw benchmark valueCI 1,262 - 1,283
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #163
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,335
- Percentile
- 50.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: english. Source rank: #190. Votes: 13391. Organization: alibaba. License: Apache 2.0.
50.2% percentile inside its fair comparison set1,335Raw benchmark valueCI 1,329 - 1,341
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #163
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,261
- Percentile
- 50.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: exclude_ties. Source rank: #192. Votes: 18351. Organization: alibaba. License: Apache 2.0.
50.2% percentile inside its fair comparison set1,261Raw benchmark valueCI 1,254 - 1,267
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #159
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,315
- Percentile
- 51.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: hard_prompts. Source rank: #188. Votes: 10694. Organization: alibaba. License: Apache 2.0.
51.4% percentile inside its fair comparison set1,315Raw benchmark valueCI 1,309 - 1,322
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #160
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,331
- Percentile
- 50.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: hard_prompts_english. Source rank: #189. Votes: 5811. Organization: alibaba. License: Apache 2.0.
50.9% percentile inside its fair comparison set1,331Raw benchmark valueCI 1,323 - 1,339
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #166
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,286
- Percentile
- 49.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: instruction_following. Source rank: #196. Votes: 6129. Organization: alibaba. License: Apache 2.0.
49.2% percentile inside its fair comparison set1,286Raw benchmark valueCI 1,278 - 1,293
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #158
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,313
- Percentile
- 48.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: longer_query. Source rank: #187. Votes: 4458. Organization: alibaba. License: Apache 2.0.
48.4% percentile inside its fair comparison set1,313Raw benchmark valueCI 1,304 - 1,322
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #163
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,308
- Percentile
- 49.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-30b-a3b`. Category: multi_turn. Source rank: #193. Votes: 4522. Organization: alibaba. License: Apache 2.0.
49.8% percentile inside its fair comparison set1,308Raw benchmark valueCI 1,299 - 1,317