Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #246 · Source label: Qwen3 32B (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 9
- Percentile
- 38%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
38% percentile inside its fair comparison set9Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #152 · Source label: Qwen3 32B (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 17.3%
- Percentile
- 49.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
49.7% percentile inside its fair comparison set17.3%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #127 · Source label: Qwen3 32B (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 18%
- Percentile
- 57.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
57.7% percentile inside its fair comparison set18%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #250 · Source label: Qwen3 32B (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 31.5%
- Percentile
- 21.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `ifbench`.
21.3% percentile inside its fair comparison set31.5%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #90 · Source label: Qwen3 32B (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.3 /1M tokens
- Percentile
- 67.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
67.8% percentile inside its fair comparison set$0.3 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #93 · Source label: Qwen3 32B (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.2 /1M input tokens
- Percentile
- 66.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
66.7% percentile inside its fair comparison set$0.2 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #90 · Source label: Qwen3 32B (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.6 /1M output tokens
- Percentile
- 67.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
67.8% percentile inside its fair comparison set$0.6 /1M output tokensRaw benchmark value
Output Speed
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #107 · Source label: Qwen3 32B (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 93.6 tokens/s
- Percentile
- 49.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.
49.5% percentile inside its fair comparison set93.6 tokens/sRaw benchmark value
Time to first token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #128 · Source label: Qwen3 32B (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 2.61s
- Percentile
- 39.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.
39.5% percentile inside its fair comparison set2.61sRaw benchmark value
Time to first answer token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #137 · Source label: Qwen3 32B (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 23.07s
- Percentile
- 35.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.
35.2% percentile inside its fair comparison set23.07sRaw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #149
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,347
- Percentile
- 54.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: overall. Source rank: #178. Votes: 3926. Organization: alibaba. License: Apache 2.0.
54.5% percentile inside its fair comparison set1,347Raw benchmark valueCI 1,338 - 1,357
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #154
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,305
- Percentile
- 52.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: creative_writing. Source rank: #185. Votes: 616. Organization: alibaba. License: Apache 2.0.
52.6% percentile inside its fair comparison set1,305Raw benchmark valueCI 1,282 - 1,327
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #150
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,366
- Percentile
- 54.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: english. Source rank: #178. Votes: 2230. Organization: alibaba. License: Apache 2.0.
54.2% percentile inside its fair comparison set1,366Raw benchmark valueCI 1,353 - 1,378
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #148
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,307
- Percentile
- 54.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: exclude_ties. Source rank: #176. Votes: 2544. Organization: alibaba. License: Apache 2.0.
54.8% percentile inside its fair comparison set1,307Raw benchmark valueCI 1,293 - 1,322
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #143
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,368
- Percentile
- 56.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: hard_prompts. Source rank: #172. Votes: 1207. Organization: alibaba. License: Apache 2.0.
56.3% percentile inside its fair comparison set1,368Raw benchmark valueCI 1,351 - 1,384
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #144
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,382
- Percentile
- 55.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: hard_prompts_english. Source rank: #172. Votes: 729. Organization: alibaba. License: Apache 2.0.
55.9% percentile inside its fair comparison set1,382Raw benchmark valueCI 1,361 - 1,403
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #148
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,332
- Percentile
- 54.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: instruction_following. Source rank: #178. Votes: 858. Organization: alibaba. License: Apache 2.0.
54.8% percentile inside its fair comparison set1,332Raw benchmark valueCI 1,313 - 1,351
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #142
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,355
- Percentile
- 53.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: longer_query. Source rank: #171. Votes: 504. Organization: alibaba. License: Apache 2.0.
53.6% percentile inside its fair comparison set1,355Raw benchmark valueCI 1,330 - 1,380
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #151
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,338
- Percentile
- 53.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: multi_turn. Source rank: #182. Votes: 590. Organization: alibaba. License: Apache 2.0.
53.6% percentile inside its fair comparison set1,338Raw benchmark valueCI 1,315 - 1,361
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #142
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,340
- Percentile
- 56.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: overall. Source rank: #168. Votes: 3926. Organization: alibaba. License: Apache 2.0.
56.6% percentile inside its fair comparison set1,340Raw benchmark valueCI 1,331 - 1,349
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #147
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,298
- Percentile
- 54.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: creative_writing. Source rank: #175. Votes: 616. Organization: alibaba. License: Apache 2.0.
54.8% percentile inside its fair comparison set1,298Raw benchmark valueCI 1,276 - 1,320
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #143
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,359
- Percentile
- 56.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: english. Source rank: #169. Votes: 2230. Organization: alibaba. License: Apache 2.0.
56.3% percentile inside its fair comparison set1,359Raw benchmark valueCI 1,346 - 1,371
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #142
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,297
- Percentile
- 56.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: exclude_ties. Source rank: #168. Votes: 2544. Organization: alibaba. License: Apache 2.0.
56.6% percentile inside its fair comparison set1,297Raw benchmark valueCI 1,282 - 1,311
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #148
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,335
- Percentile
- 54.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: hard_prompts. Source rank: #175. Votes: 1207. Organization: alibaba. License: Apache 2.0.
54.8% percentile inside its fair comparison set1,335Raw benchmark valueCI 1,319 - 1,351
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #149
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,350
- Percentile
- 54.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: hard_prompts_english. Source rank: #176. Votes: 729. Organization: alibaba. License: Apache 2.0.
54.3% percentile inside its fair comparison set1,350Raw benchmark valueCI 1,329 - 1,370
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #152
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,305
- Percentile
- 53.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: instruction_following. Source rank: #181. Votes: 858. Organization: alibaba. License: Apache 2.0.
53.5% percentile inside its fair comparison set1,305Raw benchmark valueCI 1,287 - 1,324
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #144
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,327
- Percentile
- 53%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: longer_query. Source rank: #172. Votes: 504. Organization: alibaba. License: Apache 2.0.
53% percentile inside its fair comparison set1,327Raw benchmark valueCI 1,303 - 1,351
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #149
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,331
- Percentile
- 54.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `qwen3-32b`. Category: multi_turn. Source rank: #177. Votes: 590. Organization: alibaba. License: Apache 2.0.
54.2% percentile inside its fair comparison set1,331Raw benchmark valueCI 1,308 - 1,354