Model profile · DeepSeek

DeepSeek Chat

Open weightsbudget · registry tag 2026 open generalist

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 33.6%
Verified coverage: 33.6%
Spread: 75.1%
Last verified: Jun 20, 2026

54%bench fit

textcodedocument7 aliases34 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text34 benchmarks60.6%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #87 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25
Percentile: 78.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

78.2% percentile inside its fair comparison set

25Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #70 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 24.2%
Percentile: 77.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

77.2% percentile inside its fair comparison set

24.2%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #257 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.5%
Percentile: 14.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

14.1% percentile inside its fair comparison set

6.5%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #98 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 49%
Percentile: 69.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

69.2% percentile inside its fair comparison set

49%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #147 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.8 /1M tokens
Percentile: 47.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

47.1% percentile inside its fair comparison set

$0.8 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #159 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.5 /1M input tokens
Percentile: 42.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

42.8% percentile inside its fair comparison set

$0.5 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #144 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.6 /1M output tokens
Percentile: 48.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

48.2% percentile inside its fair comparison set

$1.6 /1M output tokensRaw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,425
Percentile: 80.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #80. Votes: 47303. Organization: deepseek. License: MIT.

80.9% percentile inside its fair comparison set

1,425Raw benchmark valueCI 1,421 - 1,429

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #59 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,401
Percentile: 82%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: creative_writing. Source rank: #73. Votes: 6687. Organization: deepseek. License: MIT.

82% percentile inside its fair comparison set

1,401Raw benchmark valueCI 1,393 - 1,409

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #65 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 80.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: english. Source rank: #81. Votes: 21521. Organization: deepseek. License: MIT.

80.3% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,432 - 1,442

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #64 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 80.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: exclude_ties. Source rank: #81. Votes: 33503. Organization: deepseek. License: MIT.

80.6% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,411 - 1,421

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,447
Percentile: 81.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts. Source rank: #77. Votes: 26212. Organization: deepseek. License: MIT.

81.8% percentile inside its fair comparison set

1,447Raw benchmark valueCI 1,443 - 1,452

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #61 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,456
Percentile: 81.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts_english. Source rank: #77. Votes: 12512. Organization: deepseek. License: MIT.

81.5% percentile inside its fair comparison set

1,456Raw benchmark valueCI 1,449 - 1,462

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 83.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: instruction_following. Source rank: #69. Votes: 12931. Organization: deepseek. License: MIT.

83.7% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,415 - 1,427

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #57 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 81.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: longer_query. Source rank: #71. Votes: 12953. Organization: deepseek. License: MIT.

81.6% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,436 - 1,448

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #64 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,428
Percentile: 80.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: multi_turn. Source rank: #82. Votes: 8265. Organization: deepseek. License: MIT.

80.5% percentile inside its fair comparison set

1,428Raw benchmark valueCI 1,421 - 1,435

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #55 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #67. Votes: 47303. Organization: deepseek. License: MIT.

83.4% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,421 - 1,428

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #55 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 83.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: creative_writing. Source rank: #68. Votes: 6687. Organization: deepseek. License: MIT.

83.3% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,392 - 1,407

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #57 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,435
Percentile: 82.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: english. Source rank: #69. Votes: 21521. Organization: deepseek. License: MIT.

82.8% percentile inside its fair comparison set

1,435Raw benchmark valueCI 1,431 - 1,440

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,414
Percentile: 83.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: exclude_ties. Source rank: #68. Votes: 33503. Organization: deepseek. License: MIT.

83.1% percentile inside its fair comparison set

1,414Raw benchmark valueCI 1,409 - 1,419

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,434
Percentile: 84%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts. Source rank: #67. Votes: 26212. Organization: deepseek. License: MIT.

84% percentile inside its fair comparison set

1,434Raw benchmark valueCI 1,429 - 1,438

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 83.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts_english. Source rank: #64. Votes: 12512. Organization: deepseek. License: MIT.

83.6% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,436 - 1,448

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 85.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: instruction_following. Source rank: #61. Votes: 12931. Organization: deepseek. License: MIT.

85.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,407 - 1,419

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #45 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 85.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: longer_query. Source rank: #57. Votes: 12953. Organization: deepseek. License: MIT.

85.5% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,423 - 1,435

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,426
Percentile: 83%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: multi_turn. Source rank: #70. Votes: 8265. Organization: deepseek. License: MIT.

83% percentile inside its fair comparison set

1,426Raw benchmark valueCI 1,419 - 1,433

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #94 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 23.1%
Percentile: 13.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

13.9% percentile inside its fair comparison set

23.1%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #78 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 64.2%
Percentile: 28.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

28.7% percentile inside its fair comparison set

64.2%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #87 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20.6%
Percentile: 20.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

20.4% percentile inside its fair comparison set

20.6%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #90 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 24%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

17.6% percentile inside its fair comparison set

24%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20.7%
Percentile: 11.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

11.1% percentile inside its fair comparison set

20.7%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #92 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 27%
Percentile: 15.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

15.7% percentile inside its fair comparison set

27%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #74 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 81.8%
Percentile: 32.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

32.4% percentile inside its fair comparison set

81.8%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #76 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 44.9%
Percentile: 30.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

30.6% percentile inside its fair comparison set

44.9%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #80 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 66%
Percentile: 26.2%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

26.2% percentile inside its fair comparison set

66%Raw benchmark value

Coding15 benchmarks57%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #51 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 32.6%
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

83.4% percentile inside its fair comparison set

32.6%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #78 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 38.7%
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

79.3% percentile inside its fair comparison set

38.7%Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,332
Percentile: 28.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #65. Votes: 10470. Organization: deepseek. License: MIT.

28.8% percentile inside its fair comparison set

1,332Raw benchmark valueCI 1,325 - 1,339

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,332
Percentile: 28.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: webdev. Source rank: #65. Votes: 10470. Organization: deepseek. License: MIT.

28.8% percentile inside its fair comparison set

1,332Raw benchmark valueCI 1,325 - 1,339

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,308
Percentile: 21.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: webdev-html. Source rank: #71. Votes: 5252. Organization: deepseek. License: MIT.

21.9% percentile inside its fair comparison set

1,308Raw benchmark valueCI 1,297 - 1,319

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #44 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 27.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: webdev-react. Source rank: #56. Votes: 5199. Organization: deepseek. License: MIT.

27.1% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,338 - 1,356

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #65 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,470
Percentile: 80%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: coding. Source rank: #81. Votes: 10631. Organization: deepseek. License: MIT.

80% percentile inside its fair comparison set

1,470Raw benchmark valueCI 1,463 - 1,476

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,449
Percentile: 83.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: coding. Source rank: #63. Votes: 10631. Organization: deepseek. License: MIT.

83.8% percentile inside its fair comparison set

1,449Raw benchmark valueCI 1,442 - 1,455

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #55 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 46.7%
Percentile: 50.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

50.9% percentile inside its fair comparison set

46.7%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #43 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 75.7%
Percentile: 61.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

61.1% percentile inside its fair comparison set

75.7%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 40%
Percentile: 58.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

58.3% percentile inside its fair comparison set

40%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #63 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 30%
Percentile: 44.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

44.9% percentile inside its fair comparison set

30%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #33 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 70%
Percentile: 77.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

77.8% percentile inside its fair comparison set

70%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 77.5%
Percentile: 67.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

67.6% percentile inside its fair comparison set

77.5%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 73.9%
Percentile: 61.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

61.1% percentile inside its fair comparison set

73.9%Raw benchmark value

Reasoning / math / science15 benchmarks42.2%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #93 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 10.5%
Percentile: 75.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

75.1% percentile inside its fair comparison set

10.5%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #104 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 75.1%
Percentile: 72.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

72.5% percentile inside its fair comparison set

75.1%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #52 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0.9%
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

83.4% percentile inside its fair comparison set

0.9%Raw benchmark value

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,430
Percentile: 83.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: math. Source rank: #68. Votes: 3004. Organization: deepseek. License: MIT.

83.1% percentile inside its fair comparison set

1,430Raw benchmark valueCI 1,419 - 1,441

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #43 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 86.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: math. Source rank: #53. Votes: 3004. Organization: deepseek. License: MIT.

86.6% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,426 - 1,448

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #88 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 64%
Percentile: 19.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

19.4% percentile inside its fair comparison set

64%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #84 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 44.3%
Percentile: 23.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

23.1% percentile inside its fair comparison set

44.3%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 95%
Percentile: 46.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

46.3% percentile inside its fair comparison set

95%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #89 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 10%
Percentile: 19.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

19.4% percentile inside its fair comparison set

10%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #91 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76.5%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

17.6% percentile inside its fair comparison set

76.5%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #82 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 74.3%
Percentile: 25%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

25% percentile inside its fair comparison set

74.3%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #81 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 50%
Percentile: 27.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

27.8% percentile inside its fair comparison set

50%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #99 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 9%
Percentile: 9.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

9.3% percentile inside its fair comparison set

9%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #88 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76%
Percentile: 19.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

19.4% percentile inside its fair comparison set

76%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #83 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 42%
Percentile: 25%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

25% percentile inside its fair comparison set

42%Raw benchmark value

Professional reasoning24 benchmarks68.3%

APEX-Agents-AA

AA · Professional reasoning · Objective

Long-horizon agentic task completion.

Rank #16 · Source label: DeepSeek V3.2 (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 14.5%
Percentile: 37.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `apexAgents`.

37.5% percentile inside its fair comparison set

14.5%Raw benchmark value

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,449
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: expert. Source rank: #75. Votes: 3093. Organization: deepseek. License: MIT.

79.3% percentile inside its fair comparison set

1,449Raw benchmark valueCI 1,438 - 1,460

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #64 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 80.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_business_and_management_and_financial_operations. Source rank: #80. Votes: 8818. Organization: deepseek. License: MIT.

80.2% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,414 - 1,428

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 81.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_entertainment_and_sports_and_media. Source rank: #78. Votes: 8542. Organization: deepseek. License: MIT.

81.1% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,389 - 1,403

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,431
Percentile: 77.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_legal_and_government. Source rank: #87. Votes: 3382. Organization: deepseek. License: MIT.

77.5% percentile inside its fair comparison set

1,431Raw benchmark valueCI 1,420 - 1,441

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,451
Percentile: 83%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_life_and_physical_and_social_science. Source rank: #69. Votes: 7351. Organization: deepseek. License: MIT.

83% percentile inside its fair comparison set

1,451Raw benchmark valueCI 1,444 - 1,459

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 80.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_mathematical. Source rank: #73. Votes: 2265. Organization: deepseek. License: MIT.

80.8% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,426 - 1,451

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #72 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 75.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_medicine_and_healthcare. Source rank: #90. Votes: 3017. Organization: deepseek. License: MIT.

75.9% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,430 - 1,453

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,457
Percentile: 79.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_software_and_it_services. Source rank: #84. Votes: 17009. Organization: deepseek. License: MIT.

79.4% percentile inside its fair comparison set

1,457Raw benchmark valueCI 1,452 - 1,463

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 83.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_writing_and_literature_and_language. Source rank: #69. Votes: 10401. Organization: deepseek. License: MIT.

83.6% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,403 - 1,416

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 80%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: expert. Source rank: #67. Votes: 3093. Organization: deepseek. License: MIT.

80% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,425 - 1,447

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 78.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_business_and_management_and_financial_operations. Source rank: #82. Votes: 8818. Organization: deepseek. License: MIT.

78.9% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,403 - 1,417

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 82.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_entertainment_and_sports_and_media. Source rank: #71. Votes: 8542. Organization: deepseek. License: MIT.

82.4% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,388 - 1,402

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 77.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_legal_and_government. Source rank: #83. Votes: 3382. Organization: deepseek. License: MIT.

77.5% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,414 - 1,435

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #50 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 84.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_life_and_physical_and_social_science. Source rank: #58. Votes: 7351. Organization: deepseek. License: MIT.

84.8% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,437 - 1,452

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #42 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 86.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_mathematical. Source rank: #50. Votes: 2265. Organization: deepseek. License: MIT.

86.7% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,429 - 1,455

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #63 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 79%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_medicine_and_healthcare. Source rank: #74. Votes: 3017. Organization: deepseek. License: MIT.

79% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,426 - 1,449

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #61 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,443
Percentile: 81.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_software_and_it_services. Source rank: #74. Votes: 17009. Organization: deepseek. License: MIT.

81.5% percentile inside its fair comparison set

1,443Raw benchmark valueCI 1,438 - 1,448

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #57 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,407
Percentile: 82.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_writing_and_literature_and_language. Source rank: #69. Votes: 10401. Organization: deepseek. License: MIT.

82.7% percentile inside its fair comparison set

1,407Raw benchmark valueCI 1,400 - 1,413

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #89 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 45%
Percentile: 18.5%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

18.5% percentile inside its fair comparison set

45%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #79 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 51.8%
Percentile: 27.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

27.8% percentile inside its fair comparison set

51.8%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #100 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 1.7%
Percentile: 8.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

8.3% percentile inside its fair comparison set

1.7%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #90 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 35.4%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

17.6% percentile inside its fair comparison set

35.4%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 98%
Percentile: 74.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

74.1% percentile inside its fair comparison set

98%Raw benchmark value

Search / tool use1 benchmark75.1%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #78 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 78.9%
Percentile: 75.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

75.1% percentile inside its fair comparison set

78.9%Raw benchmark value

Long context1 benchmark61.3%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #123 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 39%
Percentile: 61.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

61.3% percentile inside its fair comparison set

39%Raw benchmark value

Multilingual14 benchmarks74.2%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #65 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,464
Percentile: 78.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: chinese. Source rank: #79. Votes: 2442. Organization: deepseek. License: MIT.

78.3% percentile inside its fair comparison set

1,464Raw benchmark valueCI 1,452 - 1,477

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #66 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 69.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: french. Source rank: #82. Votes: 1118. Organization: deepseek. License: MIT.

69.9% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,423 - 1,462

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #66 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 72.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: german. Source rank: #84. Votes: 861. Organization: deepseek. License: MIT.

72.6% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,390 - 1,432

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,374
Percentile: 71.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: japanese. Source rank: #76. Votes: 397. Organization: deepseek. License: MIT.

71.9% percentile inside its fair comparison set

1,374Raw benchmark valueCI 1,343 - 1,405

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 70.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: korean. Source rank: #79. Votes: 762. Organization: deepseek. License: MIT.

70.7% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,346 - 1,390

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #59 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 79.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: russian. Source rank: #77. Votes: 4947. Organization: deepseek. License: MIT.

79.9% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,415 - 1,432

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 71.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: spanish. Source rank: #77. Votes: 1274. Organization: deepseek. License: MIT.

71.5% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,399 - 1,435

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,460
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: chinese. Source rank: #74. Votes: 2442. Organization: deepseek. License: MIT.

79.3% percentile inside its fair comparison set

1,460Raw benchmark valueCI 1,447 - 1,473

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,433
Percentile: 71.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: french. Source rank: #77. Votes: 1118. Organization: deepseek. License: MIT.

71.8% percentile inside its fair comparison set

1,433Raw benchmark valueCI 1,414 - 1,453

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #67 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,406
Percentile: 72.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: german. Source rank: #83. Votes: 861. Organization: deepseek. License: MIT.

72.2% percentile inside its fair comparison set

1,406Raw benchmark valueCI 1,385 - 1,427

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,366
Percentile: 70.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: japanese. Source rank: #76. Votes: 397. Organization: deepseek. License: MIT.

70.9% percentile inside its fair comparison set

1,366Raw benchmark valueCI 1,335 - 1,398

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #61 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,369
Percentile: 71.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: korean. Source rank: #74. Votes: 762. Organization: deepseek. License: MIT.

71.2% percentile inside its fair comparison set

1,369Raw benchmark valueCI 1,347 - 1,392

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #47 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 84.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: russian. Source rank: #59. Votes: 4947. Organization: deepseek. License: MIT.

84.1% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,415 - 1,432

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 74.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: spanish. Source rank: #69. Votes: 1274. Organization: deepseek. License: MIT.

74.3% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,403 - 1,438

Source links and registry checks

official

DeepSeek models and pricing

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

official

LiveBench

Jun 20, 2026

source →

Model profile · DeepSeek

DeepSeek Chat

Open weightsbudget · registry tag 2026 open generalist

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 33.6%
Verified coverage: 33.6%
Spread: 75.1%
Last verified: Jun 20, 2026

54%bench fit

textcodedocument7 aliases34 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text34 benchmarks60.6%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #87 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25
Percentile: 78.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

78.2% percentile inside its fair comparison set

25Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #70 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 24.2%
Percentile: 77.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

77.2% percentile inside its fair comparison set

24.2%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #257 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.5%
Percentile: 14.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

14.1% percentile inside its fair comparison set

6.5%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #98 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 49%
Percentile: 69.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

69.2% percentile inside its fair comparison set

49%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #147 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.8 /1M tokens
Percentile: 47.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

47.1% percentile inside its fair comparison set

$0.8 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #159 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.5 /1M input tokens
Percentile: 42.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

42.8% percentile inside its fair comparison set

$0.5 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #144 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.6 /1M output tokens
Percentile: 48.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

48.2% percentile inside its fair comparison set

$1.6 /1M output tokensRaw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,425
Percentile: 80.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #80. Votes: 47303. Organization: deepseek. License: MIT.

80.9% percentile inside its fair comparison set

1,425Raw benchmark valueCI 1,421 - 1,429

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #59 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,401
Percentile: 82%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: creative_writing. Source rank: #73. Votes: 6687. Organization: deepseek. License: MIT.

82% percentile inside its fair comparison set

1,401Raw benchmark valueCI 1,393 - 1,409

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #65 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 80.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: english. Source rank: #81. Votes: 21521. Organization: deepseek. License: MIT.

80.3% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,432 - 1,442

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #64 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 80.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: exclude_ties. Source rank: #81. Votes: 33503. Organization: deepseek. License: MIT.

80.6% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,411 - 1,421

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,447
Percentile: 81.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts. Source rank: #77. Votes: 26212. Organization: deepseek. License: MIT.

81.8% percentile inside its fair comparison set

1,447Raw benchmark valueCI 1,443 - 1,452

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #61 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,456
Percentile: 81.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts_english. Source rank: #77. Votes: 12512. Organization: deepseek. License: MIT.

81.5% percentile inside its fair comparison set

1,456Raw benchmark valueCI 1,449 - 1,462

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 83.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: instruction_following. Source rank: #69. Votes: 12931. Organization: deepseek. License: MIT.

83.7% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,415 - 1,427

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #57 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 81.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: longer_query. Source rank: #71. Votes: 12953. Organization: deepseek. License: MIT.

81.6% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,436 - 1,448

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #64 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,428
Percentile: 80.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: multi_turn. Source rank: #82. Votes: 8265. Organization: deepseek. License: MIT.

80.5% percentile inside its fair comparison set

1,428Raw benchmark valueCI 1,421 - 1,435

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #55 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #67. Votes: 47303. Organization: deepseek. License: MIT.

83.4% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,421 - 1,428

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #55 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 83.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: creative_writing. Source rank: #68. Votes: 6687. Organization: deepseek. License: MIT.

83.3% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,392 - 1,407

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #57 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,435
Percentile: 82.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: english. Source rank: #69. Votes: 21521. Organization: deepseek. License: MIT.

82.8% percentile inside its fair comparison set

1,435Raw benchmark valueCI 1,431 - 1,440

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,414
Percentile: 83.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: exclude_ties. Source rank: #68. Votes: 33503. Organization: deepseek. License: MIT.

83.1% percentile inside its fair comparison set

1,414Raw benchmark valueCI 1,409 - 1,419

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,434
Percentile: 84%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts. Source rank: #67. Votes: 26212. Organization: deepseek. License: MIT.

84% percentile inside its fair comparison set

1,434Raw benchmark valueCI 1,429 - 1,438

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 83.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: hard_prompts_english. Source rank: #64. Votes: 12512. Organization: deepseek. License: MIT.

83.6% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,436 - 1,448

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 85.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: instruction_following. Source rank: #61. Votes: 12931. Organization: deepseek. License: MIT.

85.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,407 - 1,419

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #45 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 85.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: longer_query. Source rank: #57. Votes: 12953. Organization: deepseek. License: MIT.

85.5% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,423 - 1,435

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,426
Percentile: 83%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: multi_turn. Source rank: #70. Votes: 8265. Organization: deepseek. License: MIT.

83% percentile inside its fair comparison set

1,426Raw benchmark valueCI 1,419 - 1,433

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #94 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 23.1%
Percentile: 13.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

13.9% percentile inside its fair comparison set

23.1%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #78 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 64.2%
Percentile: 28.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

28.7% percentile inside its fair comparison set

64.2%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #87 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20.6%
Percentile: 20.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

20.4% percentile inside its fair comparison set

20.6%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #90 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 24%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

17.6% percentile inside its fair comparison set

24%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20.7%
Percentile: 11.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

11.1% percentile inside its fair comparison set

20.7%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #92 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 27%
Percentile: 15.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

15.7% percentile inside its fair comparison set

27%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #74 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 81.8%
Percentile: 32.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

32.4% percentile inside its fair comparison set

81.8%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #76 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 44.9%
Percentile: 30.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

30.6% percentile inside its fair comparison set

44.9%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #80 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 66%
Percentile: 26.2%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

26.2% percentile inside its fair comparison set

66%Raw benchmark value

Coding15 benchmarks57%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #51 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 32.6%
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

83.4% percentile inside its fair comparison set

32.6%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #78 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 38.7%
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

79.3% percentile inside its fair comparison set

38.7%Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,332
Percentile: 28.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: overall. Source rank: #65. Votes: 10470. Organization: deepseek. License: MIT.

28.8% percentile inside its fair comparison set

1,332Raw benchmark valueCI 1,325 - 1,339

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,332
Percentile: 28.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: webdev. Source rank: #65. Votes: 10470. Organization: deepseek. License: MIT.

28.8% percentile inside its fair comparison set

1,332Raw benchmark valueCI 1,325 - 1,339

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,308
Percentile: 21.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: webdev-html. Source rank: #71. Votes: 5252. Organization: deepseek. License: MIT.

21.9% percentile inside its fair comparison set

1,308Raw benchmark valueCI 1,297 - 1,319

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #44 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 27.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: webdev-react. Source rank: #56. Votes: 5199. Organization: deepseek. License: MIT.

27.1% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,338 - 1,356

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #65 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,470
Percentile: 80%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: coding. Source rank: #81. Votes: 10631. Organization: deepseek. License: MIT.

80% percentile inside its fair comparison set

1,470Raw benchmark valueCI 1,463 - 1,476

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #53 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,449
Percentile: 83.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: coding. Source rank: #63. Votes: 10631. Organization: deepseek. License: MIT.

83.8% percentile inside its fair comparison set

1,449Raw benchmark valueCI 1,442 - 1,455

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #55 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 46.7%
Percentile: 50.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

50.9% percentile inside its fair comparison set

46.7%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #43 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 75.7%
Percentile: 61.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

61.1% percentile inside its fair comparison set

75.7%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 40%
Percentile: 58.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

58.3% percentile inside its fair comparison set

40%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #63 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 30%
Percentile: 44.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

44.9% percentile inside its fair comparison set

30%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #33 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 70%
Percentile: 77.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

77.8% percentile inside its fair comparison set

70%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 77.5%
Percentile: 67.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

67.6% percentile inside its fair comparison set

77.5%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 73.9%
Percentile: 61.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

61.1% percentile inside its fair comparison set

73.9%Raw benchmark value

Reasoning / math / science15 benchmarks42.2%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #93 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 10.5%
Percentile: 75.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

75.1% percentile inside its fair comparison set

10.5%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #104 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 75.1%
Percentile: 72.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

72.5% percentile inside its fair comparison set

75.1%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #52 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0.9%
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

83.4% percentile inside its fair comparison set

0.9%Raw benchmark value

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,430
Percentile: 83.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: math. Source rank: #68. Votes: 3004. Organization: deepseek. License: MIT.

83.1% percentile inside its fair comparison set

1,430Raw benchmark valueCI 1,419 - 1,441

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #43 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 86.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: math. Source rank: #53. Votes: 3004. Organization: deepseek. License: MIT.

86.6% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,426 - 1,448

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #88 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 64%
Percentile: 19.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

19.4% percentile inside its fair comparison set

64%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #84 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 44.3%
Percentile: 23.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

23.1% percentile inside its fair comparison set

44.3%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 95%
Percentile: 46.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

46.3% percentile inside its fair comparison set

95%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #89 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 10%
Percentile: 19.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

19.4% percentile inside its fair comparison set

10%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #91 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76.5%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

17.6% percentile inside its fair comparison set

76.5%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #82 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 74.3%
Percentile: 25%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

25% percentile inside its fair comparison set

74.3%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #81 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 50%
Percentile: 27.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

27.8% percentile inside its fair comparison set

50%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #99 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 9%
Percentile: 9.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

9.3% percentile inside its fair comparison set

9%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #88 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76%
Percentile: 19.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

19.4% percentile inside its fair comparison set

76%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #83 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 42%
Percentile: 25%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

25% percentile inside its fair comparison set

42%Raw benchmark value

Professional reasoning24 benchmarks68.3%

APEX-Agents-AA

AA · Professional reasoning · Objective

Long-horizon agentic task completion.

Rank #16 · Source label: DeepSeek V3.2 (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 14.5%
Percentile: 37.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `apexAgents`.

37.5% percentile inside its fair comparison set

14.5%Raw benchmark value

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,449
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: expert. Source rank: #75. Votes: 3093. Organization: deepseek. License: MIT.

79.3% percentile inside its fair comparison set

1,449Raw benchmark valueCI 1,438 - 1,460

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #64 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 80.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_business_and_management_and_financial_operations. Source rank: #80. Votes: 8818. Organization: deepseek. License: MIT.

80.2% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,414 - 1,428

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 81.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_entertainment_and_sports_and_media. Source rank: #78. Votes: 8542. Organization: deepseek. License: MIT.

81.1% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,389 - 1,403

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,431
Percentile: 77.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_legal_and_government. Source rank: #87. Votes: 3382. Organization: deepseek. License: MIT.

77.5% percentile inside its fair comparison set

1,431Raw benchmark valueCI 1,420 - 1,441

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,451
Percentile: 83%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_life_and_physical_and_social_science. Source rank: #69. Votes: 7351. Organization: deepseek. License: MIT.

83% percentile inside its fair comparison set

1,451Raw benchmark valueCI 1,444 - 1,459

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 80.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_mathematical. Source rank: #73. Votes: 2265. Organization: deepseek. License: MIT.

80.8% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,426 - 1,451

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #72 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 75.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_medicine_and_healthcare. Source rank: #90. Votes: 3017. Organization: deepseek. License: MIT.

75.9% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,430 - 1,453

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,457
Percentile: 79.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_software_and_it_services. Source rank: #84. Votes: 17009. Organization: deepseek. License: MIT.

79.4% percentile inside its fair comparison set

1,457Raw benchmark valueCI 1,452 - 1,463

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #54 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 83.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_writing_and_literature_and_language. Source rank: #69. Votes: 10401. Organization: deepseek. License: MIT.

83.6% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,403 - 1,416

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 80%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: expert. Source rank: #67. Votes: 3093. Organization: deepseek. License: MIT.

80% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,425 - 1,447

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 78.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_business_and_management_and_financial_operations. Source rank: #82. Votes: 8818. Organization: deepseek. License: MIT.

78.9% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,403 - 1,417

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 82.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_entertainment_and_sports_and_media. Source rank: #71. Votes: 8542. Organization: deepseek. License: MIT.

82.4% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,388 - 1,402

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #68 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 77.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_legal_and_government. Source rank: #83. Votes: 3382. Organization: deepseek. License: MIT.

77.5% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,414 - 1,435

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #50 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 84.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_life_and_physical_and_social_science. Source rank: #58. Votes: 7351. Organization: deepseek. License: MIT.

84.8% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,437 - 1,452

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #42 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 86.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_mathematical. Source rank: #50. Votes: 2265. Organization: deepseek. License: MIT.

86.7% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,429 - 1,455

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #63 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 79%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_medicine_and_healthcare. Source rank: #74. Votes: 3017. Organization: deepseek. License: MIT.

79% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,426 - 1,449

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #61 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,443
Percentile: 81.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_software_and_it_services. Source rank: #74. Votes: 17009. Organization: deepseek. License: MIT.

81.5% percentile inside its fair comparison set

1,443Raw benchmark valueCI 1,438 - 1,448

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #57 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,407
Percentile: 82.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: industry_writing_and_literature_and_language. Source rank: #69. Votes: 10401. Organization: deepseek. License: MIT.

82.7% percentile inside its fair comparison set

1,407Raw benchmark valueCI 1,400 - 1,413

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #89 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 45%
Percentile: 18.5%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

18.5% percentile inside its fair comparison set

45%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #79 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 51.8%
Percentile: 27.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

27.8% percentile inside its fair comparison set

51.8%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #100 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 1.7%
Percentile: 8.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

8.3% percentile inside its fair comparison set

1.7%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #90 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 35.4%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

17.6% percentile inside its fair comparison set

35.4%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #48 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 98%
Percentile: 74.1%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

74.1% percentile inside its fair comparison set

98%Raw benchmark value

Search / tool use1 benchmark75.1%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #78 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 78.9%
Percentile: 75.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

75.1% percentile inside its fair comparison set

78.9%Raw benchmark value

Long context1 benchmark61.3%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #123 · Source label: DeepSeek V3.2 (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 39%
Percentile: 61.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

61.3% percentile inside its fair comparison set

39%Raw benchmark value

Multilingual14 benchmarks74.2%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #65 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,464
Percentile: 78.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: chinese. Source rank: #79. Votes: 2442. Organization: deepseek. License: MIT.

78.3% percentile inside its fair comparison set

1,464Raw benchmark valueCI 1,452 - 1,477

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #66 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,442
Percentile: 69.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: french. Source rank: #82. Votes: 1118. Organization: deepseek. License: MIT.

69.9% percentile inside its fair comparison set

1,442Raw benchmark valueCI 1,423 - 1,462

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #66 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,411
Percentile: 72.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: german. Source rank: #84. Votes: 861. Organization: deepseek. License: MIT.

72.6% percentile inside its fair comparison set

1,411Raw benchmark valueCI 1,390 - 1,432

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #58 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,374
Percentile: 71.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: japanese. Source rank: #76. Votes: 397. Organization: deepseek. License: MIT.

71.9% percentile inside its fair comparison set

1,374Raw benchmark valueCI 1,343 - 1,405

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 70.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: korean. Source rank: #79. Votes: 762. Organization: deepseek. License: MIT.

70.7% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,346 - 1,390

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #59 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 79.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: russian. Source rank: #77. Votes: 4947. Organization: deepseek. License: MIT.

79.9% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,415 - 1,432

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 71.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: spanish. Source rank: #77. Votes: 1274. Organization: deepseek. License: MIT.

71.5% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,399 - 1,435

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,460
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: chinese. Source rank: #74. Votes: 2442. Organization: deepseek. License: MIT.

79.3% percentile inside its fair comparison set

1,460Raw benchmark valueCI 1,447 - 1,473

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #62 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,433
Percentile: 71.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: french. Source rank: #77. Votes: 1118. Organization: deepseek. License: MIT.

71.8% percentile inside its fair comparison set

1,433Raw benchmark valueCI 1,414 - 1,453

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #67 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,406
Percentile: 72.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: german. Source rank: #83. Votes: 861. Organization: deepseek. License: MIT.

72.2% percentile inside its fair comparison set

1,406Raw benchmark valueCI 1,385 - 1,427

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #60 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,366
Percentile: 70.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: japanese. Source rank: #76. Votes: 397. Organization: deepseek. License: MIT.

70.9% percentile inside its fair comparison set

1,366Raw benchmark valueCI 1,335 - 1,398

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #61 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,369
Percentile: 71.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: korean. Source rank: #74. Votes: 762. Organization: deepseek. License: MIT.

71.2% percentile inside its fair comparison set

1,369Raw benchmark valueCI 1,347 - 1,392

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #47 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 84.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: russian. Source rank: #59. Votes: 4947. Organization: deepseek. License: MIT.

84.1% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,415 - 1,432

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #56 · Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 74.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `deepseek-v3.2`. Category: spanish. Source rank: #69. Votes: 1274. Organization: deepseek. License: MIT.

74.3% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,403 - 1,438

Source links and registry checks

official

DeepSeek models and pricing

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

official

LiveBench

Jun 20, 2026

source →

DeepSeek Chat

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks

Loading model evidence.

DeepSeek Chat

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks