Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #112 · Source label: DeepSeek V3.2 Exp (Non-reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 21
- Percentile
- 71.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
71.9% percentile inside its fair comparison set21Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #84 · Source label: DeepSeek V3.2 Exp (Non-reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 22.7%
- Percentile
- 72.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
72.1% percentile inside its fair comparison set22.7%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #218 · Source label: DeepSeek V3.2 Exp (Non-reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 9.7%
- Percentile
- 27.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
27.2% percentile inside its fair comparison set9.7%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #140 · Source label: DeepSeek V3.2 Exp (Non-reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 43.1%
- Percentile
- 55.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `ifbench`.
55.9% percentile inside its fair comparison set43.1%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #95 · Source label: DeepSeek V3.2 Exp (Reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.3 /1M tokens
- Percentile
- 65.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
65.9% percentile inside its fair comparison set$0.3 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #118 · Source label: DeepSeek V3.2 Exp (Reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.3 /1M input tokens
- Percentile
- 57.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
57.6% percentile inside its fair comparison set$0.3 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #79 · Source label: DeepSeek V3.2 Exp (Reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.4 /1M output tokens
- Percentile
- 71.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
71.7% percentile inside its fair comparison set$0.4 /1M output tokensRaw benchmark value
Openness Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #71 · Source label: DeepSeek V3.2 Exp (Reasoning)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 44
- Percentile
- 71%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.
71% percentile inside its fair comparison set44Raw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #67 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,423
- Percentile
- 79.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: overall. Source rank: #85. Votes: 11922. Organization: deepseek. License: MIT.
79.7% percentile inside its fair comparison set1,423Raw benchmark valueCI 1,417 - 1,429
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #45 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,410
- Percentile
- 86.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: creative_writing. Source rank: #58. Votes: 1628. Organization: deepseek. License: MIT.
86.4% percentile inside its fair comparison set1,410Raw benchmark valueCI 1,396 - 1,425
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #63 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 80.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: english. Source rank: #79. Votes: 5676. Organization: deepseek. License: MIT.
80.9% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,429 - 1,446
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #63 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 80.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: exclude_ties. Source rank: #80. Votes: 8383. Organization: deepseek. License: MIT.
80.9% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,407 - 1,425
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: hard_prompts. Source rank: #76. Votes: 6464. Organization: deepseek. License: MIT.
82.2% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,440 - 1,456
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 80.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: hard_prompts_english. Source rank: #81. Votes: 3206. Organization: deepseek. License: MIT.
80.2% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,443 - 1,464
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #62 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 81.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: instruction_following. Source rank: #78. Votes: 3310. Organization: deepseek. License: MIT.
81.2% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,406 - 1,427
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: longer_query. Source rank: #69. Votes: 2933. Organization: deepseek. License: MIT.
82.2% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,431 - 1,453
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 82%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: multi_turn. Source rank: #76. Votes: 1984. Organization: deepseek. License: MIT.
82% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,418 - 1,444
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,423
- Percentile
- 82.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: overall. Source rank: #69. Votes: 11922. Organization: deepseek. License: MIT.
82.8% percentile inside its fair comparison set1,423Raw benchmark valueCI 1,416 - 1,429
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #46 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,404
- Percentile
- 86.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: creative_writing. Source rank: #59. Votes: 1628. Organization: deepseek. License: MIT.
86.1% percentile inside its fair comparison set1,404Raw benchmark valueCI 1,390 - 1,419
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: english. Source rank: #64. Votes: 5676. Organization: deepseek. License: MIT.
84.3% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,431 - 1,448
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 83.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: exclude_ties. Source rank: #67. Votes: 8383. Organization: deepseek. License: MIT.
83.4% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,406 - 1,423
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: hard_prompts. Source rank: #73. Votes: 6464. Organization: deepseek. License: MIT.
82.2% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,422 - 1,438
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #60 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: hard_prompts_english. Source rank: #71. Votes: 3206. Organization: deepseek. License: MIT.
81.8% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,428 - 1,449
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #67 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,403
- Percentile
- 79.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: instruction_following. Source rank: #81. Votes: 3310. Organization: deepseek. License: MIT.
79.7% percentile inside its fair comparison set1,403Raw benchmark valueCI 1,392 - 1,413
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #63 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 79.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: longer_query. Source rank: #76. Votes: 2933. Organization: deepseek. License: MIT.
79.6% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,408 - 1,429
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 82%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `deepseek-v3.2-exp`. Category: multi_turn. Source rank: #73. Votes: 1984. Organization: deepseek. License: MIT.
82% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,411 - 1,437
Instruction following
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #100 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 19.3%
- Percentile
- 8.3%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.
8.3% percentile inside its fair comparison set19.3%Raw benchmark value
Language
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #75 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 65.6%
- Percentile
- 31.5%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.
31.5% percentile inside its fair comparison set65.6%Raw benchmark value
Paraphrase
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #101 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 16.1%
- Percentile
- 7.4%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.
7.4% percentile inside its fair comparison set16.1%Raw benchmark value
Simplify
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #96 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 20.9%
- Percentile
- 12%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.
12% percentile inside its fair comparison set20.9%Raw benchmark value
Story generation
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #103 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 16.7%
- Percentile
- 5.6%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.
5.6% percentile inside its fair comparison set16.7%Raw benchmark value
Summarize
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #99 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 23.7%
- Percentile
- 9.3%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.
9.3% percentile inside its fair comparison set23.7%Raw benchmark value
Connections
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #77 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 79.3%
- Percentile
- 29.6%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.
29.6% percentile inside its fair comparison set79.3%Raw benchmark value
Plot unscrambling
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #74 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 45.5%
- Percentile
- 32.4%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.
32.4% percentile inside its fair comparison set45.5%Raw benchmark value
Typos
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #67 · Source label: deepseek-v3.2-exp
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 72%
- Percentile
- 41.1%
- Last updated
- archived
- Eligibility
- preview_model
Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.
41.1% percentile inside its fair comparison set72%Raw benchmark value