Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #23 · Source label: DeepSeek V4 Flash (Reasoning, High Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 37
- Percentile
- 94.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
94.4% percentile inside its fair comparison set37Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #25 · Source label: DeepSeek V4 Flash (Reasoning, High Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 35.5%
- Percentile
- 91.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
91.9% percentile inside its fair comparison set35.5%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #282 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 4.2%
- Percentile
- 5.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
5.7% percentile inside its fair comparison set4.2%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #16 · Source label: DeepSeek V4 Flash (Reasoning, High Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 73.5%
- Percentile
- 95.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `ifbench`.
95.2% percentile inside its fair comparison set73.5%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.2 /1M tokens
- Percentile
- 77.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
77.2% percentile inside its fair comparison set$0.2 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #76 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.1 /1M input tokens
- Percentile
- 73.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
73.9% percentile inside its fair comparison set$0.1 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.3 /1M output tokens
- Percentile
- 80.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
80.4% percentile inside its fair comparison set$0.3 /1M output tokensRaw benchmark value
Output Speed
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #86 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 110.2 tokens/s
- Percentile
- 59.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.
59.5% percentile inside its fair comparison set110.2 tokens/sRaw benchmark value
Time to first token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #62 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1.45s
- Percentile
- 71%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.
71% percentile inside its fair comparison set1.45sRaw benchmark value
Time to first answer token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #189 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 52.36s
- Percentile
- 10.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.
10.5% percentile inside its fair comparison set52.36sRaw benchmark value
Openness Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #34 · Source label: DeepSeek V4 Flash (Reasoning, Max Effort)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 50
- Percentile
- 83.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.
83.9% percentile inside its fair comparison set50Raw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,434
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: overall. Source rank: #67. Votes: 28291. Organization: deepseek. License: MIT.
84.3% percentile inside its fair comparison set1,434Raw benchmark valueCI 1,429 - 1,439
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #46
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,409
- Percentile
- 86.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: creative_writing. Source rank: #60. Votes: 4490. Organization: deepseek. License: MIT.
86.1% percentile inside its fair comparison set1,409Raw benchmark valueCI 1,400 - 1,419
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 82.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: english. Source rank: #71. Votes: 13110. Organization: deepseek. License: MIT.
82.8% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,435 - 1,448
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: exclude_ties. Source rank: #67. Votes: 21507. Organization: deepseek. License: MIT.
84.3% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,422 - 1,434
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #51
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 84.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: hard_prompts. Source rank: #65. Votes: 18735. Organization: deepseek. License: MIT.
84.6% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,452 - 1,463
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: hard_prompts_english. Source rank: #66. Votes: 9228. Organization: deepseek. License: MIT.
84% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,453 - 1,467
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #50
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 84.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: instruction_following. Source rank: #62. Votes: 9473. Organization: deepseek. License: MIT.
84.9% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,421 - 1,435
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 84.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: longer_query. Source rank: #62. Votes: 12189. Organization: deepseek. License: MIT.
84.2% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,440 - 1,453
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #44
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,452
- Percentile
- 86.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: multi_turn. Source rank: #56. Votes: 5118. Organization: deepseek. License: MIT.
86.7% percentile inside its fair comparison set1,452Raw benchmark valueCI 1,442 - 1,461
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #47
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: overall. Source rank: #59. Votes: 28291. Organization: deepseek. License: MIT.
85.8% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,425 - 1,434
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #45
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,405
- Percentile
- 86.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: creative_writing. Source rank: #58. Votes: 4490. Organization: deepseek. License: MIT.
86.4% percentile inside its fair comparison set1,405Raw benchmark valueCI 1,395 - 1,414
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #54
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,437
- Percentile
- 83.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: english. Source rank: #66. Votes: 13110. Organization: deepseek. License: MIT.
83.7% percentile inside its fair comparison set1,437Raw benchmark valueCI 1,431 - 1,443
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,421
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: exclude_ties. Source rank: #61. Votes: 21507. Organization: deepseek. License: MIT.
85.2% percentile inside its fair comparison set1,421Raw benchmark valueCI 1,415 - 1,427
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #45
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,441
- Percentile
- 86.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: hard_prompts. Source rank: #55. Votes: 18735. Organization: deepseek. License: MIT.
86.5% percentile inside its fair comparison set1,441Raw benchmark valueCI 1,436 - 1,447
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #50
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,444
- Percentile
- 84.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: hard_prompts_english. Source rank: #59. Votes: 9228. Organization: deepseek. License: MIT.
84.9% percentile inside its fair comparison set1,444Raw benchmark valueCI 1,437 - 1,451
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #42
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 87.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: instruction_following. Source rank: #50. Votes: 9473. Organization: deepseek. License: MIT.
87.4% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,412 - 1,426
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #43
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,432
- Percentile
- 86.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: longer_query. Source rank: #51. Votes: 12189. Organization: deepseek. License: MIT.
86.2% percentile inside its fair comparison set1,432Raw benchmark valueCI 1,426 - 1,439
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #35
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `deepseek-v4-flash`. Category: multi_turn. Source rank: #45. Votes: 5118. Organization: deepseek. License: MIT.
89.5% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,438 - 1,456
Instruction following
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #25
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 63.1%
- Percentile
- 77.8%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.
77.8% percentile inside its fair comparison set63.1%Raw benchmark value
Language
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 70.1%
- Percentile
- 40.7%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.
40.7% percentile inside its fair comparison set70.1%Raw benchmark value
Paraphrase
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #21
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 62.4%
- Percentile
- 81.5%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.
81.5% percentile inside its fair comparison set62.4%Raw benchmark value
Simplify
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #43
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 54.1%
- Percentile
- 61.1%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.
61.1% percentile inside its fair comparison set54.1%Raw benchmark value
Story generation
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #24
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 67.3%
- Percentile
- 78.7%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.
78.7% percentile inside its fair comparison set67.3%Raw benchmark value
Summarize
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 68.9%
- Percentile
- 88%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.
88% percentile inside its fair comparison set68.9%Raw benchmark value
Connections
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 89.5%
- Percentile
- 50%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.
50% percentile inside its fair comparison set89.5%Raw benchmark value
Plot unscrambling
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #69
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 46.9%
- Percentile
- 37%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.
37% percentile inside its fair comparison set46.9%Raw benchmark value
Typos
LB · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 74%
- Percentile
- 45.8%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.
45.8% percentile inside its fair comparison set74%Raw benchmark value