Model profile · DeepSeek

DeepSeek Reasoner

Open weightsbudget · registry tag 2026 open reasoning

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 14.4%
Verified coverage: 14.4%
Spread: 82.1%
Last verified: Jun 20, 2026

51%bench fit

textcodedocument6 aliases29 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text26 benchmarks61.3%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #185 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 13
Percentile: 53.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

53.4% percentile inside its fair comparison set

13Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #28 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 30.7%
Percentile: 90.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

90.9% percentile inside its fair comparison set

30.7%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #209 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 10.5%
Percentile: 30.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

30.2% percentile inside its fair comparison set

10.5%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #181 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 39%
Percentile: 43.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

43.2% percentile inside its fair comparison set

39%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #213 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $2.4 /1M tokens
Percentile: 23.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

23.2% percentile inside its fair comparison set

$2.4 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #225 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.7 /1M input tokens
Percentile: 18.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

18.8% percentile inside its fair comparison set

$1.7 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #207 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $4.7 /1M output tokens
Percentile: 25.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

25.4% percentile inside its fair comparison set

$4.7 /1M output tokensRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #44 · Source label: DeepSeek R1 0528 (May '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 50
Percentile: 83.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

83.9% percentile inside its fair comparison set

50Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,398
Percentile: 69.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: overall. Source rank: #121. Votes: 18524. Organization: deepseek. License: MIT.

69.8% percentile inside its fair comparison set

1,398Raw benchmark valueCI 1,393 - 1,403

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #89 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,374
Percentile: 72.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: creative_writing. Source rank: #112. Votes: 3289. Organization: deepseek. License: MIT.

72.8% percentile inside its fair comparison set

1,374Raw benchmark valueCI 1,364 - 1,384

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 70.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: english. Source rank: #118. Votes: 10721. Organization: deepseek. License: MIT.

70.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,407 - 1,419

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #96 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,383
Percentile: 70.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: exclude_ties. Source rank: #118. Votes: 12504. Organization: deepseek. License: MIT.

70.8% percentile inside its fair comparison set

1,383Raw benchmark valueCI 1,376 - 1,391

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 69.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts. Source rank: #120. Votes: 4116. Organization: deepseek. License: MIT.

69.8% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,410 - 1,428

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #91 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,434
Percentile: 72.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts_english. Source rank: #111. Votes: 2656. Organization: deepseek. License: MIT.

72.2% percentile inside its fair comparison set

1,434Raw benchmark valueCI 1,422 - 1,445

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #91 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 72.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: instruction_following. Source rank: #112. Votes: 6426. Organization: deepseek. License: MIT.

72.3% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,390 - 1,405

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,399
Percentile: 65.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: longer_query. Source rank: #131. Votes: 2303. Organization: deepseek. License: MIT.

65.5% percentile inside its fair comparison set

1,399Raw benchmark valueCI 1,387 - 1,411

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: multi_turn. Source rank: #107. Votes: 2418. Organization: deepseek. License: MIT.

73.1% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,398 - 1,422

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #114 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,373
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: overall. Source rank: #137. Votes: 18524. Organization: deepseek. License: MIT.

65.2% percentile inside its fair comparison set

1,373Raw benchmark valueCI 1,368 - 1,378

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,355
Percentile: 70.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: creative_writing. Source rank: #120. Votes: 3289. Organization: deepseek. License: MIT.

70.3% percentile inside its fair comparison set

1,355Raw benchmark valueCI 1,344 - 1,365

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #111 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 66.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: english. Source rank: #133. Votes: 10721. Organization: deepseek. License: MIT.

66.2% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,379 - 1,391

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #111 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,346
Percentile: 66.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: exclude_ties. Source rank: #133. Votes: 12504. Organization: deepseek. License: MIT.

66.2% percentile inside its fair comparison set

1,346Raw benchmark valueCI 1,339 - 1,353

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #127 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,361
Percentile: 61.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts. Source rank: #153. Votes: 4116. Organization: deepseek. License: MIT.

61.2% percentile inside its fair comparison set

1,361Raw benchmark valueCI 1,353 - 1,370

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #125 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 61.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts_english. Source rank: #149. Votes: 2656. Organization: deepseek. License: MIT.

61.7% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,365 - 1,387

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #110 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,358
Percentile: 66.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: instruction_following. Source rank: #134. Votes: 6426. Organization: deepseek. License: MIT.

66.5% percentile inside its fair comparison set

1,358Raw benchmark valueCI 1,350 - 1,365

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #121 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,355
Percentile: 60.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: longer_query. Source rank: #148. Votes: 2303. Organization: deepseek. License: MIT.

60.5% percentile inside its fair comparison set

1,355Raw benchmark valueCI 1,343 - 1,367

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,390
Percentile: 69.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: multi_turn. Source rank: #120. Votes: 2418. Organization: deepseek. License: MIT.

69.7% percentile inside its fair comparison set

1,390Raw benchmark valueCI 1,379 - 1,402

Coding5 benchmarks54.5%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #178 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.1%
Percentile: 41.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

41.4% percentile inside its fair comparison set

6.1%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #126 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 35.7%
Percentile: 66%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

66% percentile inside its fair comparison set

35.7%Raw benchmark value

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #59 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 70.2%
Percentile: 35.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: Fireworks AI.

35.6% percentile inside its fair comparison set

70.2%Raw benchmark valueCI 68% - 72.5%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #98 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 69.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: coding. Source rank: #119. Votes: 2317. Organization: deepseek. License: MIT.

69.7% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,433 - 1,457

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #129 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,372
Percentile: 60%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: coding. Source rank: #154. Votes: 2317. Organization: deepseek. License: MIT.

60% percentile inside its fair comparison set

1,372Raw benchmark valueCI 1,360 - 1,384

Reasoning / math / science6 benchmarks67.6%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #109 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 9.3%
Percentile: 70.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

70.8% percentile inside its fair comparison set

9.3%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #136 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 70.8%
Percentile: 63.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

63.9% percentile inside its fair comparison set

70.8%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #64 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0.6%
Percentile: 80.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

80.1% percentile inside its fair comparison set

0.6%Raw benchmark value

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #46 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.2%
Percentile: 49.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

49.4% percentile inside its fair comparison set

83.2%Raw benchmark valueCI 82.3% - 84.1%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #80 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 74.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: math. Source rank: #100. Votes: 1606. Organization: deepseek. License: MIT.

74.8% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,397 - 1,425

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,392
Percentile: 66.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: math. Source rank: #127. Votes: 1606. Organization: deepseek. License: MIT.

66.2% percentile inside its fair comparison set

1,392Raw benchmark valueCI 1,378 - 1,406

Professional reasoning20 benchmarks61.3%

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #83 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 67.3%
Percentile: 8.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Fireworks AI.

8.9% percentile inside its fair comparison set

67.3%Raw benchmark valueCI 66.3% - 68.4%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #44 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 72.3%
Percentile: 52.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Fireworks AI.

52.7% percentile inside its fair comparison set

72.3%Raw benchmark valueCI 70.6% - 74%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #111 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 60%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: expert. Source rank: #135. Votes: 848. Organization: deepseek. License: MIT.

60% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,380 - 1,420

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #109 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 66%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_business_and_management_and_financial_operations. Source rank: #132. Votes: 1798. Organization: deepseek. License: MIT.

66% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,374 - 1,402

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #92 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,372
Percentile: 71.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_entertainment_and_sports_and_media. Source rank: #113. Votes: 3571. Organization: deepseek. License: MIT.

71.8% percentile inside its fair comparison set

1,372Raw benchmark valueCI 1,363 - 1,382

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #109 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,399
Percentile: 63.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_legal_and_government. Source rank: #132. Votes: 1097. Organization: deepseek. License: MIT.

63.8% percentile inside its fair comparison set

1,399Raw benchmark valueCI 1,381 - 1,416

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #102 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 68.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_life_and_physical_and_social_science. Source rank: #124. Votes: 3309. Organization: deepseek. License: MIT.

68.7% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,401 - 1,421

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #89 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,414
Percentile: 71.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_mathematical. Source rank: #107. Votes: 1504. Organization: deepseek. License: MIT.

71.4% percentile inside its fair comparison set

1,414Raw benchmark valueCI 1,399 - 1,429

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 64.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_medicine_and_healthcare. Source rank: #130. Votes: 830. Organization: deepseek. License: MIT.

64.1% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,391 - 1,432

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #102 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,431
Percentile: 68.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_software_and_it_services. Source rank: #122. Votes: 3954. Organization: deepseek. License: MIT.

68.9% percentile inside its fair comparison set

1,431Raw benchmark valueCI 1,421 - 1,440

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #90 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,382
Percentile: 72.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_writing_and_literature_and_language. Source rank: #112. Votes: 5355. Organization: deepseek. License: MIT.

72.5% percentile inside its fair comparison set

1,382Raw benchmark valueCI 1,374 - 1,390

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #134 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,337
Percentile: 51.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: expert. Source rank: #160. Votes: 848. Organization: deepseek. License: MIT.

51.6% percentile inside its fair comparison set

1,337Raw benchmark valueCI 1,318 - 1,357

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #126 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,348
Percentile: 60.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_business_and_management_and_financial_operations. Source rank: #150. Votes: 1798. Organization: deepseek. License: MIT.

60.7% percentile inside its fair comparison set

1,348Raw benchmark valueCI 1,334 - 1,362

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #106 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_entertainment_and_sports_and_media. Source rank: #129. Votes: 3571. Organization: deepseek. License: MIT.

67.5% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,337 - 1,357

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #118 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,371
Percentile: 60.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_legal_and_government. Source rank: #142. Votes: 1097. Organization: deepseek. License: MIT.

60.7% percentile inside its fair comparison set

1,371Raw benchmark valueCI 1,353 - 1,388

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #122 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,375
Percentile: 62.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_life_and_physical_and_social_science. Source rank: #144. Votes: 3309. Organization: deepseek. License: MIT.

62.5% percentile inside its fair comparison set

1,375Raw benchmark valueCI 1,365 - 1,386

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,389
Percentile: 65.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_mathematical. Source rank: #127. Votes: 1504. Organization: deepseek. License: MIT.

65.6% percentile inside its fair comparison set

1,389Raw benchmark valueCI 1,374 - 1,404

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #119 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 60%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_medicine_and_healthcare. Source rank: #143. Votes: 830. Organization: deepseek. License: MIT.

60% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,349 - 1,390

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #129 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 60.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_software_and_it_services. Source rank: #154. Votes: 3954. Organization: deepseek. License: MIT.

60.6% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,367 - 1,385

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #104 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 68.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_writing_and_literature_and_language. Source rank: #127. Votes: 5355. Organization: deepseek. License: MIT.

68.2% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,356 - 1,373

Search / tool use1 benchmark10.4%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #278 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 11.4%
Percentile: 10.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

10.4% percentile inside its fair comparison set

11.4%Raw benchmark value

Long context2 benchmarks49.9%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #84 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 52.3%
Percentile: 73.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

73.7% percentile inside its fair comparison set

52.3%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #66 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 54.1%
Percentile: 26.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: Fireworks AI.

26.1% percentile inside its fair comparison set

54.1%Raw benchmark valueCI 52.2% - 56%

Multilingual14 benchmarks60%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #98 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 67.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: chinese. Source rank: #120. Votes: 1151. Organization: deepseek. License: MIT.

67.1% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,411 - 1,447

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #112 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 48.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: french. Source rank: #135. Votes: 188. Organization: deepseek. License: MIT.

48.6% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,349 - 1,433

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #75 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,398
Percentile: 68.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: german. Source rank: #94. Votes: 436. Organization: deepseek. License: MIT.

68.8% percentile inside its fair comparison set

1,398Raw benchmark valueCI 1,369 - 1,426

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #80 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,338
Percentile: 61.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: japanese. Source rank: #102. Votes: 468. Organization: deepseek. License: MIT.

61.1% percentile inside its fair comparison set

1,338Raw benchmark valueCI 1,312 - 1,364

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #72 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,357
Percentile: 65.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: korean. Source rank: #91. Votes: 215. Organization: deepseek. License: MIT.

65.9% percentile inside its fair comparison set

1,357Raw benchmark valueCI 1,317 - 1,397

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #106 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,381
Percentile: 63.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: russian. Source rank: #129. Votes: 1904. Organization: deepseek. License: MIT.

63.7% percentile inside its fair comparison set

1,381Raw benchmark valueCI 1,368 - 1,394

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #85 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 60.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: spanish. Source rank: #106. Votes: 114. Organization: deepseek. License: MIT.

60.7% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,343 - 1,446

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 64.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: chinese. Source rank: #130. Votes: 1151. Organization: deepseek. License: MIT.

64.1% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,382 - 1,418

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #114 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 47.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: french. Source rank: #135. Votes: 188. Organization: deepseek. License: MIT.

47.7% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,326 - 1,410

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #85 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 64.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: german. Source rank: #102. Votes: 436. Organization: deepseek. License: MIT.

64.6% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,357 - 1,413

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #83 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,325
Percentile: 59.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: japanese. Source rank: #103. Votes: 468. Organization: deepseek. License: MIT.

59.6% percentile inside its fair comparison set

1,325Raw benchmark valueCI 1,299 - 1,351

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #93 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,330
Percentile: 55.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: korean. Source rank: #113. Votes: 215. Organization: deepseek. License: MIT.

55.8% percentile inside its fair comparison set

1,330Raw benchmark valueCI 1,290 - 1,369

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #117 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,354
Percentile: 59.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: russian. Source rank: #142. Votes: 1904. Organization: deepseek. License: MIT.

59.9% percentile inside its fair comparison set

1,354Raw benchmark valueCI 1,341 - 1,366

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #101 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,375
Percentile: 53.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: spanish. Source rank: #121. Votes: 114. Organization: deepseek. License: MIT.

53.3% percentile inside its fair comparison set

1,375Raw benchmark valueCI 1,322 - 1,428

Source links and registry checks

official

DeepSeek models and pricing

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

Model profile · DeepSeek

DeepSeek Reasoner

Open weightsbudget · registry tag 2026 open reasoning

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 14.4%
Verified coverage: 14.4%
Spread: 82.1%
Last verified: Jun 20, 2026

51%bench fit

textcodedocument6 aliases29 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text26 benchmarks61.3%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #185 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 13
Percentile: 53.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

53.4% percentile inside its fair comparison set

13Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #28 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 30.7%
Percentile: 90.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

90.9% percentile inside its fair comparison set

30.7%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #209 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 10.5%
Percentile: 30.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

30.2% percentile inside its fair comparison set

10.5%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #181 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 39%
Percentile: 43.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

43.2% percentile inside its fair comparison set

39%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #213 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $2.4 /1M tokens
Percentile: 23.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

23.2% percentile inside its fair comparison set

$2.4 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #225 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.7 /1M input tokens
Percentile: 18.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

18.8% percentile inside its fair comparison set

$1.7 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #207 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $4.7 /1M output tokens
Percentile: 25.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

25.4% percentile inside its fair comparison set

$4.7 /1M output tokensRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #44 · Source label: DeepSeek R1 0528 (May '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 50
Percentile: 83.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

83.9% percentile inside its fair comparison set

50Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,398
Percentile: 69.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: overall. Source rank: #121. Votes: 18524. Organization: deepseek. License: MIT.

69.8% percentile inside its fair comparison set

1,398Raw benchmark valueCI 1,393 - 1,403

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #89 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,374
Percentile: 72.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: creative_writing. Source rank: #112. Votes: 3289. Organization: deepseek. License: MIT.

72.8% percentile inside its fair comparison set

1,374Raw benchmark valueCI 1,364 - 1,384

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 70.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: english. Source rank: #118. Votes: 10721. Organization: deepseek. License: MIT.

70.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,407 - 1,419

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #96 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,383
Percentile: 70.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: exclude_ties. Source rank: #118. Votes: 12504. Organization: deepseek. License: MIT.

70.8% percentile inside its fair comparison set

1,383Raw benchmark valueCI 1,376 - 1,391

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 69.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts. Source rank: #120. Votes: 4116. Organization: deepseek. License: MIT.

69.8% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,410 - 1,428

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #91 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,434
Percentile: 72.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts_english. Source rank: #111. Votes: 2656. Organization: deepseek. License: MIT.

72.2% percentile inside its fair comparison set

1,434Raw benchmark valueCI 1,422 - 1,445

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #91 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 72.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: instruction_following. Source rank: #112. Votes: 6426. Organization: deepseek. License: MIT.

72.3% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,390 - 1,405

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,399
Percentile: 65.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: longer_query. Source rank: #131. Votes: 2303. Organization: deepseek. License: MIT.

65.5% percentile inside its fair comparison set

1,399Raw benchmark valueCI 1,387 - 1,411

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: multi_turn. Source rank: #107. Votes: 2418. Organization: deepseek. License: MIT.

73.1% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,398 - 1,422

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #114 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,373
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: overall. Source rank: #137. Votes: 18524. Organization: deepseek. License: MIT.

65.2% percentile inside its fair comparison set

1,373Raw benchmark valueCI 1,368 - 1,378

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,355
Percentile: 70.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: creative_writing. Source rank: #120. Votes: 3289. Organization: deepseek. License: MIT.

70.3% percentile inside its fair comparison set

1,355Raw benchmark valueCI 1,344 - 1,365

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #111 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 66.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: english. Source rank: #133. Votes: 10721. Organization: deepseek. License: MIT.

66.2% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,379 - 1,391

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #111 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,346
Percentile: 66.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: exclude_ties. Source rank: #133. Votes: 12504. Organization: deepseek. License: MIT.

66.2% percentile inside its fair comparison set

1,346Raw benchmark valueCI 1,339 - 1,353

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #127 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,361
Percentile: 61.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts. Source rank: #153. Votes: 4116. Organization: deepseek. License: MIT.

61.2% percentile inside its fair comparison set

1,361Raw benchmark valueCI 1,353 - 1,370

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #125 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 61.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: hard_prompts_english. Source rank: #149. Votes: 2656. Organization: deepseek. License: MIT.

61.7% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,365 - 1,387

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #110 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,358
Percentile: 66.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: instruction_following. Source rank: #134. Votes: 6426. Organization: deepseek. License: MIT.

66.5% percentile inside its fair comparison set

1,358Raw benchmark valueCI 1,350 - 1,365

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #121 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,355
Percentile: 60.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: longer_query. Source rank: #148. Votes: 2303. Organization: deepseek. License: MIT.

60.5% percentile inside its fair comparison set

1,355Raw benchmark valueCI 1,343 - 1,367

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,390
Percentile: 69.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: multi_turn. Source rank: #120. Votes: 2418. Organization: deepseek. License: MIT.

69.7% percentile inside its fair comparison set

1,390Raw benchmark valueCI 1,379 - 1,402

Coding5 benchmarks54.5%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #178 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.1%
Percentile: 41.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

41.4% percentile inside its fair comparison set

6.1%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #126 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 35.7%
Percentile: 66%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

66% percentile inside its fair comparison set

35.7%Raw benchmark value

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #59 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 70.2%
Percentile: 35.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: Fireworks AI.

35.6% percentile inside its fair comparison set

70.2%Raw benchmark valueCI 68% - 72.5%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #98 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 69.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: coding. Source rank: #119. Votes: 2317. Organization: deepseek. License: MIT.

69.7% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,433 - 1,457

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #129 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,372
Percentile: 60%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: coding. Source rank: #154. Votes: 2317. Organization: deepseek. License: MIT.

60% percentile inside its fair comparison set

1,372Raw benchmark valueCI 1,360 - 1,384

Reasoning / math / science6 benchmarks67.6%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #109 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 9.3%
Percentile: 70.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

70.8% percentile inside its fair comparison set

9.3%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #136 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 70.8%
Percentile: 63.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

63.9% percentile inside its fair comparison set

70.8%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #64 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0.6%
Percentile: 80.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

80.1% percentile inside its fair comparison set

0.6%Raw benchmark value

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #46 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.2%
Percentile: 49.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

49.4% percentile inside its fair comparison set

83.2%Raw benchmark valueCI 82.3% - 84.1%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #80 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 74.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: math. Source rank: #100. Votes: 1606. Organization: deepseek. License: MIT.

74.8% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,397 - 1,425

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,392
Percentile: 66.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: math. Source rank: #127. Votes: 1606. Organization: deepseek. License: MIT.

66.2% percentile inside its fair comparison set

1,392Raw benchmark valueCI 1,378 - 1,406

Professional reasoning20 benchmarks61.3%

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #83 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 67.3%
Percentile: 8.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Fireworks AI.

8.9% percentile inside its fair comparison set

67.3%Raw benchmark valueCI 66.3% - 68.4%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #44 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 72.3%
Percentile: 52.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Fireworks AI.

52.7% percentile inside its fair comparison set

72.3%Raw benchmark valueCI 70.6% - 74%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #111 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 60%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: expert. Source rank: #135. Votes: 848. Organization: deepseek. License: MIT.

60% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,380 - 1,420

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #109 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 66%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_business_and_management_and_financial_operations. Source rank: #132. Votes: 1798. Organization: deepseek. License: MIT.

66% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,374 - 1,402

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #92 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,372
Percentile: 71.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_entertainment_and_sports_and_media. Source rank: #113. Votes: 3571. Organization: deepseek. License: MIT.

71.8% percentile inside its fair comparison set

1,372Raw benchmark valueCI 1,363 - 1,382

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #109 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,399
Percentile: 63.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_legal_and_government. Source rank: #132. Votes: 1097. Organization: deepseek. License: MIT.

63.8% percentile inside its fair comparison set

1,399Raw benchmark valueCI 1,381 - 1,416

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #102 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 68.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_life_and_physical_and_social_science. Source rank: #124. Votes: 3309. Organization: deepseek. License: MIT.

68.7% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,401 - 1,421

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #89 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,414
Percentile: 71.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_mathematical. Source rank: #107. Votes: 1504. Organization: deepseek. License: MIT.

71.4% percentile inside its fair comparison set

1,414Raw benchmark valueCI 1,399 - 1,429

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 64.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_medicine_and_healthcare. Source rank: #130. Votes: 830. Organization: deepseek. License: MIT.

64.1% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,391 - 1,432

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #102 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,431
Percentile: 68.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_software_and_it_services. Source rank: #122. Votes: 3954. Organization: deepseek. License: MIT.

68.9% percentile inside its fair comparison set

1,431Raw benchmark valueCI 1,421 - 1,440

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #90 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,382
Percentile: 72.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_writing_and_literature_and_language. Source rank: #112. Votes: 5355. Organization: deepseek. License: MIT.

72.5% percentile inside its fair comparison set

1,382Raw benchmark valueCI 1,374 - 1,390

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #134 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,337
Percentile: 51.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: expert. Source rank: #160. Votes: 848. Organization: deepseek. License: MIT.

51.6% percentile inside its fair comparison set

1,337Raw benchmark valueCI 1,318 - 1,357

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #126 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,348
Percentile: 60.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_business_and_management_and_financial_operations. Source rank: #150. Votes: 1798. Organization: deepseek. License: MIT.

60.7% percentile inside its fair comparison set

1,348Raw benchmark valueCI 1,334 - 1,362

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #106 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_entertainment_and_sports_and_media. Source rank: #129. Votes: 3571. Organization: deepseek. License: MIT.

67.5% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,337 - 1,357

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #118 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,371
Percentile: 60.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_legal_and_government. Source rank: #142. Votes: 1097. Organization: deepseek. License: MIT.

60.7% percentile inside its fair comparison set

1,371Raw benchmark valueCI 1,353 - 1,388

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #122 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,375
Percentile: 62.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_life_and_physical_and_social_science. Source rank: #144. Votes: 3309. Organization: deepseek. License: MIT.

62.5% percentile inside its fair comparison set

1,375Raw benchmark valueCI 1,365 - 1,386

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,389
Percentile: 65.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_mathematical. Source rank: #127. Votes: 1504. Organization: deepseek. License: MIT.

65.6% percentile inside its fair comparison set

1,389Raw benchmark valueCI 1,374 - 1,404

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #119 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 60%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_medicine_and_healthcare. Source rank: #143. Votes: 830. Organization: deepseek. License: MIT.

60% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,349 - 1,390

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #129 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 60.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_software_and_it_services. Source rank: #154. Votes: 3954. Organization: deepseek. License: MIT.

60.6% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,367 - 1,385

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #104 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 68.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: industry_writing_and_literature_and_language. Source rank: #127. Votes: 5355. Organization: deepseek. License: MIT.

68.2% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,356 - 1,373

Search / tool use1 benchmark10.4%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #278 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 11.4%
Percentile: 10.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

10.4% percentile inside its fair comparison set

11.4%Raw benchmark value

Long context2 benchmarks49.9%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #84 · Source label: DeepSeek R1 (Jan '25)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 52.3%
Percentile: 73.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

73.7% percentile inside its fair comparison set

52.3%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #66 · Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 54.1%
Percentile: 26.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: Fireworks AI.

26.1% percentile inside its fair comparison set

54.1%Raw benchmark valueCI 52.2% - 56%

Multilingual14 benchmarks60%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #98 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 67.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: chinese. Source rank: #120. Votes: 1151. Organization: deepseek. License: MIT.

67.1% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,411 - 1,447

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #112 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 48.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: french. Source rank: #135. Votes: 188. Organization: deepseek. License: MIT.

48.6% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,349 - 1,433

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #75 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,398
Percentile: 68.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: german. Source rank: #94. Votes: 436. Organization: deepseek. License: MIT.

68.8% percentile inside its fair comparison set

1,398Raw benchmark valueCI 1,369 - 1,426

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #80 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,338
Percentile: 61.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: japanese. Source rank: #102. Votes: 468. Organization: deepseek. License: MIT.

61.1% percentile inside its fair comparison set

1,338Raw benchmark valueCI 1,312 - 1,364

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #72 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,357
Percentile: 65.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: korean. Source rank: #91. Votes: 215. Organization: deepseek. License: MIT.

65.9% percentile inside its fair comparison set

1,357Raw benchmark valueCI 1,317 - 1,397

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #106 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,381
Percentile: 63.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: russian. Source rank: #129. Votes: 1904. Organization: deepseek. License: MIT.

63.7% percentile inside its fair comparison set

1,381Raw benchmark valueCI 1,368 - 1,394

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #85 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 60.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: spanish. Source rank: #106. Votes: 114. Organization: deepseek. License: MIT.

60.7% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,343 - 1,446

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #107 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 64.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: chinese. Source rank: #130. Votes: 1151. Organization: deepseek. License: MIT.

64.1% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,382 - 1,418

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #114 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 47.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: french. Source rank: #135. Votes: 188. Organization: deepseek. License: MIT.

47.7% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,326 - 1,410

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #85 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 64.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: german. Source rank: #102. Votes: 436. Organization: deepseek. License: MIT.

64.6% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,357 - 1,413

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #83 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,325
Percentile: 59.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: japanese. Source rank: #103. Votes: 468. Organization: deepseek. License: MIT.

59.6% percentile inside its fair comparison set

1,325Raw benchmark valueCI 1,299 - 1,351

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #93 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,330
Percentile: 55.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: korean. Source rank: #113. Votes: 215. Organization: deepseek. License: MIT.

55.8% percentile inside its fair comparison set

1,330Raw benchmark valueCI 1,290 - 1,369

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #117 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,354
Percentile: 59.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: russian. Source rank: #142. Votes: 1904. Organization: deepseek. License: MIT.

59.9% percentile inside its fair comparison set

1,354Raw benchmark valueCI 1,341 - 1,366

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #101 · Source label: deepseek-r1

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,375
Percentile: 53.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-r1`. Category: spanish. Source rank: #121. Votes: 114. Organization: deepseek. License: MIT.

53.3% percentile inside its fair comparison set

1,375Raw benchmark valueCI 1,322 - 1,428

Source links and registry checks

official

DeepSeek models and pricing

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

DeepSeek Reasoner

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks

Loading model evidence.

DeepSeek Reasoner

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks