Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #185 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 13
- Percentile
- 53.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
53.4% percentile inside its fair comparison set13Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #28 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 30.7%
- Percentile
- 90.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
90.9% percentile inside its fair comparison set30.7%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #209 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 10.5%
- Percentile
- 30.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
30.2% percentile inside its fair comparison set10.5%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #181 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 39%
- Percentile
- 43.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `ifbench`.
43.2% percentile inside its fair comparison set39%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #213 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $2.4 /1M tokens
- Percentile
- 23.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
23.2% percentile inside its fair comparison set$2.4 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #225 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $1.7 /1M input tokens
- Percentile
- 18.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
18.8% percentile inside its fair comparison set$1.7 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #207 · Source label: DeepSeek R1 (Jan '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $4.7 /1M output tokens
- Percentile
- 25.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
25.4% percentile inside its fair comparison set$4.7 /1M output tokensRaw benchmark value
Openness Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #44 · Source label: DeepSeek R1 0528 (May '25)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 50
- Percentile
- 83.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.
83.9% percentile inside its fair comparison set50Raw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #99 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,398
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: overall. Source rank: #121. Votes: 18524. Organization: deepseek. License: MIT.
69.8% percentile inside its fair comparison set1,398Raw benchmark valueCI 1,393 - 1,403
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #89 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,374
- Percentile
- 72.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: creative_writing. Source rank: #112. Votes: 3289. Organization: deepseek. License: MIT.
72.8% percentile inside its fair comparison set1,374Raw benchmark valueCI 1,364 - 1,384
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #97 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,413
- Percentile
- 70.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: english. Source rank: #118. Votes: 10721. Organization: deepseek. License: MIT.
70.5% percentile inside its fair comparison set1,413Raw benchmark valueCI 1,407 - 1,419
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #96 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,383
- Percentile
- 70.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: exclude_ties. Source rank: #118. Votes: 12504. Organization: deepseek. License: MIT.
70.8% percentile inside its fair comparison set1,383Raw benchmark valueCI 1,376 - 1,391
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #99 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts. Source rank: #120. Votes: 4116. Organization: deepseek. License: MIT.
69.8% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,410 - 1,428
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #91 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,434
- Percentile
- 72.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts_english. Source rank: #111. Votes: 2656. Organization: deepseek. License: MIT.
72.2% percentile inside its fair comparison set1,434Raw benchmark valueCI 1,422 - 1,445
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #91 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,397
- Percentile
- 72.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: instruction_following. Source rank: #112. Votes: 6426. Organization: deepseek. License: MIT.
72.3% percentile inside its fair comparison set1,397Raw benchmark valueCI 1,390 - 1,405
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #106 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,399
- Percentile
- 65.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: longer_query. Source rank: #131. Votes: 2303. Organization: deepseek. License: MIT.
65.5% percentile inside its fair comparison set1,399Raw benchmark valueCI 1,387 - 1,411
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #88 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,410
- Percentile
- 73.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: multi_turn. Source rank: #107. Votes: 2418. Organization: deepseek. License: MIT.
73.1% percentile inside its fair comparison set1,410Raw benchmark valueCI 1,398 - 1,422
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #114 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,373
- Percentile
- 65.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: overall. Source rank: #137. Votes: 18524. Organization: deepseek. License: MIT.
65.2% percentile inside its fair comparison set1,373Raw benchmark valueCI 1,368 - 1,378
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #97 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,355
- Percentile
- 70.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: creative_writing. Source rank: #120. Votes: 3289. Organization: deepseek. License: MIT.
70.3% percentile inside its fair comparison set1,355Raw benchmark valueCI 1,344 - 1,365
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #111 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,385
- Percentile
- 66.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: english. Source rank: #133. Votes: 10721. Organization: deepseek. License: MIT.
66.2% percentile inside its fair comparison set1,385Raw benchmark valueCI 1,379 - 1,391
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #111 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,346
- Percentile
- 66.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: exclude_ties. Source rank: #133. Votes: 12504. Organization: deepseek. License: MIT.
66.2% percentile inside its fair comparison set1,346Raw benchmark valueCI 1,339 - 1,353
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #127 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,361
- Percentile
- 61.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts. Source rank: #153. Votes: 4116. Organization: deepseek. License: MIT.
61.2% percentile inside its fair comparison set1,361Raw benchmark valueCI 1,353 - 1,370
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #125 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,376
- Percentile
- 61.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts_english. Source rank: #149. Votes: 2656. Organization: deepseek. License: MIT.
61.7% percentile inside its fair comparison set1,376Raw benchmark valueCI 1,365 - 1,387
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #110 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,358
- Percentile
- 66.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: instruction_following. Source rank: #134. Votes: 6426. Organization: deepseek. License: MIT.
66.5% percentile inside its fair comparison set1,358Raw benchmark valueCI 1,350 - 1,365
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #121 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,355
- Percentile
- 60.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: longer_query. Source rank: #148. Votes: 2303. Organization: deepseek. License: MIT.
60.5% percentile inside its fair comparison set1,355Raw benchmark valueCI 1,343 - 1,367
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #99 · Source label: deepseek-r1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,390
- Percentile
- 69.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: multi_turn. Source rank: #120. Votes: 2418. Organization: deepseek. License: MIT.
69.7% percentile inside its fair comparison set1,390Raw benchmark valueCI 1,379 - 1,402