Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #87 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 25
- Percentile
- 78.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
78.2% percentile inside its fair comparison set25Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #70 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 24.2%
- Percentile
- 77.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
77.2% percentile inside its fair comparison set24.2%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #257 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 6.5%
- Percentile
- 14.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
14.1% percentile inside its fair comparison set6.5%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #98 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 49%
- Percentile
- 69.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `ifbench`.
69.2% percentile inside its fair comparison set49%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #147 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.8 /1M tokens
- Percentile
- 47.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
47.1% percentile inside its fair comparison set$0.8 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #159 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.5 /1M input tokens
- Percentile
- 42.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
42.8% percentile inside its fair comparison set$0.5 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #144 · Source label: DeepSeek V3.2 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $1.6 /1M output tokens
- Percentile
- 48.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
48.2% percentile inside its fair comparison set$1.6 /1M output tokensRaw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #63 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,425
- Percentile
- 80.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #80. Votes: 47303. Organization: deepseek. License: MIT.
80.9% percentile inside its fair comparison set1,425Raw benchmark valueCI 1,421 - 1,429
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,401
- Percentile
- 82%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: creative_writing. Source rank: #73. Votes: 6687. Organization: deepseek. License: MIT.
82% percentile inside its fair comparison set1,401Raw benchmark valueCI 1,393 - 1,409
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,437
- Percentile
- 80.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: english. Source rank: #81. Votes: 21521. Organization: deepseek. License: MIT.
80.3% percentile inside its fair comparison set1,437Raw benchmark valueCI 1,432 - 1,442
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #64 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 80.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: exclude_ties. Source rank: #81. Votes: 33503. Organization: deepseek. License: MIT.
80.6% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,411 - 1,421
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #60 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts. Source rank: #77. Votes: 26212. Organization: deepseek. License: MIT.
81.8% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,443 - 1,452
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,456
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts_english. Source rank: #77. Votes: 12512. Organization: deepseek. License: MIT.
81.5% percentile inside its fair comparison set1,456Raw benchmark valueCI 1,449 - 1,462
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #54 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,421
- Percentile
- 83.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: instruction_following. Source rank: #69. Votes: 12931. Organization: deepseek. License: MIT.
83.7% percentile inside its fair comparison set1,421Raw benchmark valueCI 1,415 - 1,427
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 81.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: longer_query. Source rank: #71. Votes: 12953. Organization: deepseek. License: MIT.
81.6% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,436 - 1,448
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #64 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 80.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: multi_turn. Source rank: #82. Votes: 8265. Organization: deepseek. License: MIT.
80.5% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,421 - 1,435
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 83.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #67. Votes: 47303. Organization: deepseek. License: MIT.
83.4% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,421 - 1,428
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,400
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: creative_writing. Source rank: #68. Votes: 6687. Organization: deepseek. License: MIT.
83.3% percentile inside its fair comparison set1,400Raw benchmark valueCI 1,392 - 1,407
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,435
- Percentile
- 82.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: english. Source rank: #69. Votes: 21521. Organization: deepseek. License: MIT.
82.8% percentile inside its fair comparison set1,435Raw benchmark valueCI 1,431 - 1,440
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #56 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,414
- Percentile
- 83.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: exclude_ties. Source rank: #68. Votes: 33503. Organization: deepseek. License: MIT.
83.1% percentile inside its fair comparison set1,414Raw benchmark valueCI 1,409 - 1,419
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,434
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts. Source rank: #67. Votes: 26212. Organization: deepseek. License: MIT.
84% percentile inside its fair comparison set1,434Raw benchmark valueCI 1,429 - 1,438
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #54 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 83.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts_english. Source rank: #64. Votes: 12512. Organization: deepseek. License: MIT.
83.6% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,436 - 1,448
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #48 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,413
- Percentile
- 85.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: instruction_following. Source rank: #61. Votes: 12931. Organization: deepseek. License: MIT.
85.5% percentile inside its fair comparison set1,413Raw benchmark valueCI 1,407 - 1,419
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #45 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,429
- Percentile
- 85.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: longer_query. Source rank: #57. Votes: 12953. Organization: deepseek. License: MIT.
85.5% percentile inside its fair comparison set1,429Raw benchmark valueCI 1,423 - 1,435
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #56 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,426
- Percentile
- 83%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: multi_turn. Source rank: #70. Votes: 8265. Organization: deepseek. License: MIT.
83% percentile inside its fair comparison set1,426Raw benchmark valueCI 1,419 - 1,433
Instruction following
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #94 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 23.1%
- Percentile
- 13.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.
13.9% percentile inside its fair comparison set23.1%Raw benchmark value
Language
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #78 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 64.2%
- Percentile
- 28.7%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.
28.7% percentile inside its fair comparison set64.2%Raw benchmark value
Paraphrase
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #87 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 20.6%
- Percentile
- 20.4%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.
20.4% percentile inside its fair comparison set20.6%Raw benchmark value
Simplify
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #90 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 24%
- Percentile
- 17.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.
17.6% percentile inside its fair comparison set24%Raw benchmark value
Story generation
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #97 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 20.7%
- Percentile
- 11.1%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.
11.1% percentile inside its fair comparison set20.7%Raw benchmark value
Summarize
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #92 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 27%
- Percentile
- 15.7%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.
15.7% percentile inside its fair comparison set27%Raw benchmark value
Connections
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #74 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 81.8%
- Percentile
- 32.4%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.
32.4% percentile inside its fair comparison set81.8%Raw benchmark value
Plot unscrambling
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #76 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 44.9%
- Percentile
- 30.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.
30.6% percentile inside its fair comparison set44.9%Raw benchmark value
Typos
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #80 · Source label: deepseek-v3.2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 66%
- Percentile
- 26.2%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.
26.2% percentile inside its fair comparison set66%Raw benchmark value