Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #23 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 93.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: overall. Source rank: #31. Votes: 21564. Organization: alibaba. License: Proprietary.
93.2% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,460 - 1,470
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #17 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 95%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: creative_writing. Source rank: #21. Votes: 3181. Organization: alibaba. License: Proprietary.
95% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,438 - 1,460
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: english. Source rank: #30. Votes: 10186. Organization: alibaba. License: Proprietary.
93.5% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,466 - 1,479
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: exclude_ties. Source rank: #30. Votes: 15791. Organization: alibaba. License: Proprietary.
93.5% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,465 - 1,479
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #19 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,491
- Percentile
- 94.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: hard_prompts. Source rank: #24. Votes: 13727. Organization: alibaba. License: Proprietary.
94.5% percentile inside its fair comparison set1,491Raw benchmark valueCI 1,485 - 1,497
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #19 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,492
- Percentile
- 94.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: hard_prompts_english. Source rank: #22. Votes: 6755. Organization: alibaba. License: Proprietary.
94.4% percentile inside its fair comparison set1,492Raw benchmark valueCI 1,485 - 1,500
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #13 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: instruction_following. Source rank: #17. Votes: 6905. Organization: alibaba. License: Proprietary.
96.3% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,463 - 1,478
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #17 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 94.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: longer_query. Source rank: #22. Votes: 8311. Organization: alibaba. License: Proprietary.
94.7% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,474 - 1,489
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #20 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 94.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: multi_turn. Source rank: #27. Votes: 3636. Organization: alibaba. License: Proprietary.
94.1% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,468 - 1,488
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #8 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: overall. Source rank: #11. Votes: 21564. Organization: alibaba. License: Proprietary.
97.8% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,465 - 1,475
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #7 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: creative_writing. Source rank: #9. Votes: 3181. Organization: alibaba. License: Proprietary.
98.1% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,456 - 1,478
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #11 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,474
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: english. Source rank: #13. Votes: 10186. Organization: alibaba. License: Proprietary.
96.9% percentile inside its fair comparison set1,474Raw benchmark valueCI 1,468 - 1,481
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #7 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 98.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: exclude_ties. Source rank: #10. Votes: 15791. Organization: alibaba. License: Proprietary.
98.2% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,471 - 1,484
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #10 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,483
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: hard_prompts. Source rank: #12. Votes: 13727. Organization: alibaba. License: Proprietary.
97.2% percentile inside its fair comparison set1,483Raw benchmark valueCI 1,477 - 1,489
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #14 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,480
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: hard_prompts_english. Source rank: #17. Votes: 6755. Organization: alibaba. License: Proprietary.
96% percentile inside its fair comparison set1,480Raw benchmark valueCI 1,472 - 1,488
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #10 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,469
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: instruction_following. Source rank: #14. Votes: 6905. Organization: alibaba. License: Proprietary.
97.2% percentile inside its fair comparison set1,469Raw benchmark valueCI 1,461 - 1,477
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #13 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,476
- Percentile
- 96.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: longer_query. Source rank: #17. Votes: 8311. Organization: alibaba. License: Proprietary.
96.1% percentile inside its fair comparison set1,476Raw benchmark valueCI 1,469 - 1,484
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #9 · Source label: qwen3.5-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-max-preview`. Category: multi_turn. Source rank: #12. Votes: 3636. Organization: alibaba. License: Proprietary.
97.5% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,467 - 1,488