Model profile · Qwen

qwen3-next-80b-a3b-thinking

Open weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 21.9%
Verified coverage: 21.9%
Spread: n/a
Last verified: Jun 20, 2026

textcode1 aliases28 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text27 benchmarks51%

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #126

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 61.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: overall. Source rank: #153. Votes: 13693. Organization: alibaba. License: Apache 2.0.

61.5% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,364 - 1,375

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #136

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 58.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: creative_writing. Source rank: #166. Votes: 1797. Organization: alibaba. License: Apache 2.0.

58.2% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,310 - 1,338

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #123

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,387
Percentile: 62.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: english. Source rank: #148. Votes: 6636. Organization: alibaba. License: Apache 2.0.

62.5% percentile inside its fair comparison set

1,387Raw benchmark valueCI 1,379 - 1,395

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #127

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,337
Percentile: 61.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: exclude_ties. Source rank: #154. Votes: 9759. Organization: alibaba. License: Apache 2.0.

61.2% percentile inside its fair comparison set

1,337Raw benchmark valueCI 1,329 - 1,345

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #129

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,384
Percentile: 60.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts. Source rank: #155. Votes: 6691. Organization: alibaba. License: Apache 2.0.

60.6% percentile inside its fair comparison set

1,384Raw benchmark valueCI 1,377 - 1,392

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #130

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 60.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts_english. Source rank: #156. Votes: 3497. Organization: alibaba. License: Apache 2.0.

60.2% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,387 - 1,407

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #126

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,359
Percentile: 61.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: instruction_following. Source rank: #153. Votes: 3517. Organization: alibaba. License: Apache 2.0.

61.5% percentile inside its fair comparison set

1,359Raw benchmark valueCI 1,349 - 1,369

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #131

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 57.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: longer_query. Source rank: #157. Votes: 2835. Organization: alibaba. License: Apache 2.0.

57.2% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,359 - 1,381

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #137

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,351
Percentile: 57.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: multi_turn. Source rank: #164. Votes: 2296. Organization: alibaba. License: Apache 2.0.

57.9% percentile inside its fair comparison set

1,351Raw benchmark valueCI 1,338 - 1,363

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #117

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 64.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: overall. Source rank: #141. Votes: 13693. Organization: alibaba. License: Apache 2.0.

64.3% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,362 - 1,374

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #137

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,312
Percentile: 57.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: creative_writing. Source rank: #165. Votes: 1797. Organization: alibaba. License: Apache 2.0.

57.9% percentile inside its fair comparison set

1,312Raw benchmark valueCI 1,298 - 1,326

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #105

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 68%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: english. Source rank: #126. Votes: 6636. Organization: alibaba. License: Apache 2.0.

68% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,385 - 1,401

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #119

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,333
Percentile: 63.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: exclude_ties. Source rank: #143. Votes: 9759. Organization: alibaba. License: Apache 2.0.

63.7% percentile inside its fair comparison set

1,333Raw benchmark valueCI 1,326 - 1,341

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #116

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 64.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts. Source rank: #141. Votes: 6691. Organization: alibaba. License: Apache 2.0.

64.6% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,363 - 1,378

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #111

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,387
Percentile: 66%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts_english. Source rank: #134. Votes: 3497. Organization: alibaba. License: Apache 2.0.

66% percentile inside its fair comparison set

1,387Raw benchmark valueCI 1,377 - 1,397

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #120

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,343
Percentile: 63.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: instruction_following. Source rank: #147. Votes: 3517. Organization: alibaba. License: Apache 2.0.

63.4% percentile inside its fair comparison set

1,343Raw benchmark valueCI 1,333 - 1,353

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,353
Percentile: 60.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: longer_query. Source rank: #150. Votes: 2835. Organization: alibaba. License: Apache 2.0.

60.2% percentile inside its fair comparison set

1,353Raw benchmark valueCI 1,342 - 1,364

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #133

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,346
Percentile: 59.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: multi_turn. Source rank: #160. Votes: 2296. Organization: alibaba. License: Apache 2.0.

59.1% percentile inside its fair comparison set

1,346Raw benchmark valueCI 1,333 - 1,359

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #67

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 41.5%
Percentile: 38.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

38.9% percentile inside its fair comparison set

41.5%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 56.3%
Percentile: 19.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

19.4% percentile inside its fair comparison set

56.3%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #70

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 33.2%
Percentile: 36.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

36.1% percentile inside its fair comparison set

33.2%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #68

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 39.1%
Percentile: 38%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

38% percentile inside its fair comparison set

39.1%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 48.3%
Percentile: 42.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

42.6% percentile inside its fair comparison set

48.3%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #65

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 45.6%
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

40.7% percentile inside its fair comparison set

45.6%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #90

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 70.2%
Percentile: 18.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

18.5% percentile inside its fair comparison set

70.2%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #84

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 40.8%
Percentile: 23.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

23.1% percentile inside its fair comparison set

40.8%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 58%
Percentile: 10.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

10.3% percentile inside its fair comparison set

58%Raw benchmark value

Coding9 benchmarks22%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 62.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: coding. Source rank: #148. Votes: 2676. Organization: alibaba. License: Apache 2.0.

62.2% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,409 - 1,432

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 65.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: coding. Source rank: #136. Votes: 2676. Organization: alibaba. License: Apache 2.0.

65.3% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,380 - 1,403

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #100

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 8.3%
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

8.3% percentile inside its fair comparison set

8.3%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #100

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 60.7%
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

8.3% percentile inside its fair comparison set

60.7%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #93

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 15%
Percentile: 16.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

16.7% percentile inside its fair comparison set

15%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #108

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 0%
Percentile: 9.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

9.3% percentile inside its fair comparison set

0%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #106

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 10%
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

8.3% percentile inside its fair comparison set

10%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #99

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 64.8%
Percentile: 10.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

10.2% percentile inside its fair comparison set

64.8%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #100

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 56.5%
Percentile: 9.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

9.3% percentile inside its fair comparison set

56.5%Raw benchmark value

Reasoning / math / science12 benchmarks48.2%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,390
Percentile: 64.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: math. Source rank: #135. Votes: 828. Organization: alibaba. License: Apache 2.0.

64.6% percentile inside its fair comparison set

1,390Raw benchmark valueCI 1,370 - 1,409

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #103

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 67.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: math. Source rank: #123. Votes: 828. Organization: alibaba. License: Apache 2.0.

67.5% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,375 - 1,415

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #63

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 74.3%
Percentile: 42.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

42.6% percentile inside its fair comparison set

74.3%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #72

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 58.2%
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

34.3% percentile inside its fair comparison set

58.2%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #26

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 98%
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

94.4% percentile inside its fair comparison set

98%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #59

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 38%
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

47.2% percentile inside its fair comparison set

38%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #67

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 88.2%
Percentile: 38.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

38.9% percentile inside its fair comparison set

88.2%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #85

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 72.8%
Percentile: 22.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

22.2% percentile inside its fair comparison set

72.8%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #89

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 46.2%
Percentile: 20.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

20.4% percentile inside its fair comparison set

46.2%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #66

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 32.5%
Percentile: 39.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

39.3% percentile inside its fair comparison set

32.5%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #42

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 96%
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

73.1% percentile inside its fair comparison set

96%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #76

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 58%
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

34.3% percentile inside its fair comparison set

58%Raw benchmark value

Professional reasoning23 benchmarks60.3%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #123

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 55.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: expert. Source rank: #149. Votes: 620. Organization: alibaba. License: Apache 2.0.

55.6% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,361 - 1,408

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #130

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,367
Percentile: 59.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_business_and_management_and_financial_operations. Source rank: #155. Votes: 2497. Organization: alibaba. License: Apache 2.0.

59.4% percentile inside its fair comparison set

1,367Raw benchmark valueCI 1,355 - 1,379

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #141

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,317
Percentile: 56.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_entertainment_and_sports_and_media. Source rank: #171. Votes: 2454. Organization: alibaba. License: Apache 2.0.

56.7% percentile inside its fair comparison set

1,317Raw benchmark valueCI 1,305 - 1,329

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #147

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,360
Percentile: 51%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_legal_and_government. Source rank: #175. Votes: 869. Organization: alibaba. License: Apache 2.0.

51% percentile inside its fair comparison set

1,360Raw benchmark valueCI 1,340 - 1,380

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #132

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,380
Percentile: 59.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_life_and_physical_and_social_science. Source rank: #159. Votes: 2124. Organization: alibaba. License: Apache 2.0.

59.4% percentile inside its fair comparison set

1,380Raw benchmark valueCI 1,367 - 1,393

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 64%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_mathematical. Source rank: #138. Votes: 668. Organization: alibaba. License: Apache 2.0.

64% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,365 - 1,410

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 58.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_medicine_and_healthcare. Source rank: #150. Votes: 720. Organization: alibaba. License: Apache 2.0.

58.3% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,371 - 1,415

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #123

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 62.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_software_and_it_services. Source rank: #148. Votes: 4837. Organization: alibaba. License: Apache 2.0.

62.5% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,401 - 1,418

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #136

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,339
Percentile: 58.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_writing_and_literature_and_language. Source rank: #166. Votes: 3078. Organization: alibaba. License: Apache 2.0.

58.3% percentile inside its fair comparison set

1,339Raw benchmark valueCI 1,328 - 1,350

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 60.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: expert. Source rank: #133. Votes: 620. Organization: alibaba. License: Apache 2.0.

60.7% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,344 - 1,392

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #113

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,361
Percentile: 64.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_business_and_management_and_financial_operations. Source rank: #135. Votes: 2497. Organization: alibaba. License: Apache 2.0.

64.8% percentile inside its fair comparison set

1,361Raw benchmark valueCI 1,350 - 1,373

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #131

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,313
Percentile: 59.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_entertainment_and_sports_and_media. Source rank: #158. Votes: 2454. Organization: alibaba. License: Apache 2.0.

59.8% percentile inside its fair comparison set

1,313Raw benchmark valueCI 1,302 - 1,325

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 58.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_legal_and_government. Source rank: #150. Votes: 869. Organization: alibaba. License: Apache 2.0.

58.7% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,344 - 1,383

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #113

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,382
Percentile: 65.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_life_and_physical_and_social_science. Source rank: #134. Votes: 2124. Organization: alibaba. License: Apache 2.0.

65.3% percentile inside its fair comparison set

1,382Raw benchmark valueCI 1,370 - 1,395

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,389
Percentile: 64.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_mathematical. Source rank: #129. Votes: 668. Organization: alibaba. License: Apache 2.0.

64.9% percentile inside its fair comparison set

1,389Raw benchmark valueCI 1,366 - 1,411

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #104

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,390
Percentile: 65.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_medicine_and_healthcare. Source rank: #125. Votes: 720. Organization: alibaba. License: Apache 2.0.

65.1% percentile inside its fair comparison set

1,390Raw benchmark valueCI 1,368 - 1,413

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 65.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_software_and_it_services. Source rank: #135. Votes: 4837. Organization: alibaba. License: Apache 2.0.

65.8% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,385 - 1,402

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #130

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,328
Percentile: 60.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_writing_and_literature_and_language. Source rank: #157. Votes: 3078. Organization: alibaba. License: Apache 2.0.

60.2% percentile inside its fair comparison set

1,328Raw benchmark valueCI 1,317 - 1,339

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #58

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 53.6%
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

47.2% percentile inside its fair comparison set

53.6%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #82

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 50.4%
Percentile: 25%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

25% percentile inside its fair comparison set

50.4%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #62

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 14.4%
Percentile: 43.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

43.5% percentile inside its fair comparison set

14.4%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #23

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 46.3%
Percentile: 79.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

79.6% percentile inside its fair comparison set

46.3%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #27

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 100%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

100% percentile inside its fair comparison set

100%Raw benchmark value

Multilingual14 benchmarks51.6%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 63.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: chinese. Source rank: #132. Votes: 715. Organization: alibaba. License: Apache 2.0.

63.4% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,391 - 1,435

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #129

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,366
Percentile: 40.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: french. Source rank: #153. Votes: 195. Organization: alibaba. License: Apache 2.0.

40.7% percentile inside its fair comparison set

1,366Raw benchmark valueCI 1,324 - 1,408

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #106

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,367
Percentile: 55.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: german. Source rank: #128. Votes: 285. Organization: alibaba. License: Apache 2.0.

55.7% percentile inside its fair comparison set

1,367Raw benchmark valueCI 1,334 - 1,400

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #105

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,296
Percentile: 48.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: japanese. Source rank: #130. Votes: 180. Organization: alibaba. License: Apache 2.0.

48.8% percentile inside its fair comparison set

1,296Raw benchmark valueCI 1,251 - 1,342

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,319
Percentile: 48.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: korean. Source rank: #133. Votes: 332. Organization: alibaba. License: Apache 2.0.

48.1% percentile inside its fair comparison set

1,319Raw benchmark valueCI 1,286 - 1,353

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #131

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,353
Percentile: 55%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: russian. Source rank: #158. Votes: 750. Organization: alibaba. License: Apache 2.0.

55% percentile inside its fair comparison set

1,353Raw benchmark valueCI 1,332 - 1,373

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #127

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,351
Percentile: 41.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: spanish. Source rank: #155. Votes: 399. Organization: alibaba. License: Apache 2.0.

41.1% percentile inside its fair comparison set

1,351Raw benchmark valueCI 1,321 - 1,381

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #101

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 66.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: chinese. Source rank: #123. Votes: 715. Organization: alibaba. License: Apache 2.0.

66.1% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,395 - 1,439

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #121

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,356
Percentile: 44.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: french. Source rank: #144. Votes: 195. Organization: alibaba. License: Apache 2.0.

44.4% percentile inside its fair comparison set

1,356Raw benchmark valueCI 1,314 - 1,399

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #102

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,360
Percentile: 57.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: german. Source rank: #124. Votes: 285. Organization: alibaba. License: Apache 2.0.

57.4% percentile inside its fair comparison set

1,360Raw benchmark valueCI 1,327 - 1,394

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #106

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,282
Percentile: 48.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: japanese. Source rank: #128. Votes: 180. Organization: alibaba. License: Apache 2.0.

48.3% percentile inside its fair comparison set

1,282Raw benchmark valueCI 1,237 - 1,327

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #104

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,309
Percentile: 50.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: korean. Source rank: #124. Votes: 332. Organization: alibaba. License: Apache 2.0.

50.5% percentile inside its fair comparison set

1,309Raw benchmark valueCI 1,275 - 1,343

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #125

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,338
Percentile: 57.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: russian. Source rank: #152. Votes: 750. Organization: alibaba. License: Apache 2.0.

57.1% percentile inside its fair comparison set

1,338Raw benchmark valueCI 1,318 - 1,358

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #116

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,349
Percentile: 46.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: spanish. Source rank: #143. Votes: 399. Organization: alibaba. License: Apache 2.0.

46.3% percentile inside its fair comparison set

1,349Raw benchmark valueCI 1,319 - 1,379

Source links and registry checks

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

LiveBench

Jun 20, 2026

source →

Model profile · Qwen

qwen3-next-80b-a3b-thinking

Open weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 21.9%
Verified coverage: 21.9%
Spread: n/a
Last verified: Jun 20, 2026

textcode1 aliases28 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text27 benchmarks51%

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #126

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 61.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: overall. Source rank: #153. Votes: 13693. Organization: alibaba. License: Apache 2.0.

61.5% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,364 - 1,375

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #136

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 58.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: creative_writing. Source rank: #166. Votes: 1797. Organization: alibaba. License: Apache 2.0.

58.2% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,310 - 1,338

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #123

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,387
Percentile: 62.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: english. Source rank: #148. Votes: 6636. Organization: alibaba. License: Apache 2.0.

62.5% percentile inside its fair comparison set

1,387Raw benchmark valueCI 1,379 - 1,395

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #127

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,337
Percentile: 61.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: exclude_ties. Source rank: #154. Votes: 9759. Organization: alibaba. License: Apache 2.0.

61.2% percentile inside its fair comparison set

1,337Raw benchmark valueCI 1,329 - 1,345

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #129

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,384
Percentile: 60.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts. Source rank: #155. Votes: 6691. Organization: alibaba. License: Apache 2.0.

60.6% percentile inside its fair comparison set

1,384Raw benchmark valueCI 1,377 - 1,392

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #130

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 60.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts_english. Source rank: #156. Votes: 3497. Organization: alibaba. License: Apache 2.0.

60.2% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,387 - 1,407

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #126

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,359
Percentile: 61.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: instruction_following. Source rank: #153. Votes: 3517. Organization: alibaba. License: Apache 2.0.

61.5% percentile inside its fair comparison set

1,359Raw benchmark valueCI 1,349 - 1,369

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #131

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 57.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: longer_query. Source rank: #157. Votes: 2835. Organization: alibaba. License: Apache 2.0.

57.2% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,359 - 1,381

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #137

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,351
Percentile: 57.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: multi_turn. Source rank: #164. Votes: 2296. Organization: alibaba. License: Apache 2.0.

57.9% percentile inside its fair comparison set

1,351Raw benchmark valueCI 1,338 - 1,363

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #117

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 64.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: overall. Source rank: #141. Votes: 13693. Organization: alibaba. License: Apache 2.0.

64.3% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,362 - 1,374

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #137

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,312
Percentile: 57.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: creative_writing. Source rank: #165. Votes: 1797. Organization: alibaba. License: Apache 2.0.

57.9% percentile inside its fair comparison set

1,312Raw benchmark valueCI 1,298 - 1,326

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #105

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 68%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: english. Source rank: #126. Votes: 6636. Organization: alibaba. License: Apache 2.0.

68% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,385 - 1,401

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #119

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,333
Percentile: 63.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: exclude_ties. Source rank: #143. Votes: 9759. Organization: alibaba. License: Apache 2.0.

63.7% percentile inside its fair comparison set

1,333Raw benchmark valueCI 1,326 - 1,341

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #116

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,370
Percentile: 64.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts. Source rank: #141. Votes: 6691. Organization: alibaba. License: Apache 2.0.

64.6% percentile inside its fair comparison set

1,370Raw benchmark valueCI 1,363 - 1,378

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #111

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,387
Percentile: 66%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: hard_prompts_english. Source rank: #134. Votes: 3497. Organization: alibaba. License: Apache 2.0.

66% percentile inside its fair comparison set

1,387Raw benchmark valueCI 1,377 - 1,397

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #120

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,343
Percentile: 63.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: instruction_following. Source rank: #147. Votes: 3517. Organization: alibaba. License: Apache 2.0.

63.4% percentile inside its fair comparison set

1,343Raw benchmark valueCI 1,333 - 1,353

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,353
Percentile: 60.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: longer_query. Source rank: #150. Votes: 2835. Organization: alibaba. License: Apache 2.0.

60.2% percentile inside its fair comparison set

1,353Raw benchmark valueCI 1,342 - 1,364

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #133

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,346
Percentile: 59.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: multi_turn. Source rank: #160. Votes: 2296. Organization: alibaba. License: Apache 2.0.

59.1% percentile inside its fair comparison set

1,346Raw benchmark valueCI 1,333 - 1,359

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #67

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 41.5%
Percentile: 38.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

38.9% percentile inside its fair comparison set

41.5%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 56.3%
Percentile: 19.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

19.4% percentile inside its fair comparison set

56.3%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #70

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 33.2%
Percentile: 36.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

36.1% percentile inside its fair comparison set

33.2%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #68

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 39.1%
Percentile: 38%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

38% percentile inside its fair comparison set

39.1%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 48.3%
Percentile: 42.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

42.6% percentile inside its fair comparison set

48.3%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #65

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 45.6%
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

40.7% percentile inside its fair comparison set

45.6%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #90

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 70.2%
Percentile: 18.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

18.5% percentile inside its fair comparison set

70.2%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #84

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 40.8%
Percentile: 23.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

23.1% percentile inside its fair comparison set

40.8%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 58%
Percentile: 10.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

10.3% percentile inside its fair comparison set

58%Raw benchmark value

Coding9 benchmarks22%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 62.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: coding. Source rank: #148. Votes: 2676. Organization: alibaba. License: Apache 2.0.

62.2% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,409 - 1,432

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 65.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: coding. Source rank: #136. Votes: 2676. Organization: alibaba. License: Apache 2.0.

65.3% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,380 - 1,403

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #100

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 8.3%
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

8.3% percentile inside its fair comparison set

8.3%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #100

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 60.7%
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

8.3% percentile inside its fair comparison set

60.7%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #93

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 15%
Percentile: 16.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

16.7% percentile inside its fair comparison set

15%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #108

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 0%
Percentile: 9.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

9.3% percentile inside its fair comparison set

0%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #106

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 10%
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

8.3% percentile inside its fair comparison set

10%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #99

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 64.8%
Percentile: 10.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

10.2% percentile inside its fair comparison set

64.8%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #100

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 56.5%
Percentile: 9.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

9.3% percentile inside its fair comparison set

56.5%Raw benchmark value

Reasoning / math / science12 benchmarks48.2%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,390
Percentile: 64.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: math. Source rank: #135. Votes: 828. Organization: alibaba. License: Apache 2.0.

64.6% percentile inside its fair comparison set

1,390Raw benchmark valueCI 1,370 - 1,409

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #103

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 67.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: math. Source rank: #123. Votes: 828. Organization: alibaba. License: Apache 2.0.

67.5% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,375 - 1,415

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #63

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 74.3%
Percentile: 42.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

42.6% percentile inside its fair comparison set

74.3%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #72

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 58.2%
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

34.3% percentile inside its fair comparison set

58.2%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #26

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 98%
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

94.4% percentile inside its fair comparison set

98%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #59

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 38%
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

47.2% percentile inside its fair comparison set

38%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #67

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 88.2%
Percentile: 38.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

38.9% percentile inside its fair comparison set

88.2%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #85

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 72.8%
Percentile: 22.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

22.2% percentile inside its fair comparison set

72.8%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #89

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 46.2%
Percentile: 20.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

20.4% percentile inside its fair comparison set

46.2%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #66

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 32.5%
Percentile: 39.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

39.3% percentile inside its fair comparison set

32.5%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #42

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 96%
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

73.1% percentile inside its fair comparison set

96%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #76

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 58%
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

34.3% percentile inside its fair comparison set

58%Raw benchmark value

Professional reasoning23 benchmarks60.3%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #123

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 55.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: expert. Source rank: #149. Votes: 620. Organization: alibaba. License: Apache 2.0.

55.6% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,361 - 1,408

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #130

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,367
Percentile: 59.4%
Last updated: recent
Eligibility: benchmark_derived_model

59.4% percentile inside its fair comparison set

1,367Raw benchmark valueCI 1,355 - 1,379

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #141

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,317
Percentile: 56.7%
Last updated: recent
Eligibility: benchmark_derived_model

56.7% percentile inside its fair comparison set

1,317Raw benchmark valueCI 1,305 - 1,329

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #147

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,360
Percentile: 51%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_legal_and_government. Source rank: #175. Votes: 869. Organization: alibaba. License: Apache 2.0.

51% percentile inside its fair comparison set

1,360Raw benchmark valueCI 1,340 - 1,380

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #132

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,380
Percentile: 59.4%
Last updated: recent
Eligibility: benchmark_derived_model

59.4% percentile inside its fair comparison set

1,380Raw benchmark valueCI 1,367 - 1,393

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 64%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_mathematical. Source rank: #138. Votes: 668. Organization: alibaba. License: Apache 2.0.

64% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,365 - 1,410

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 58.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_medicine_and_healthcare. Source rank: #150. Votes: 720. Organization: alibaba. License: Apache 2.0.

58.3% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,371 - 1,415

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #123

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 62.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_software_and_it_services. Source rank: #148. Votes: 4837. Organization: alibaba. License: Apache 2.0.

62.5% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,401 - 1,418

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #136

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,339
Percentile: 58.3%
Last updated: recent
Eligibility: benchmark_derived_model

58.3% percentile inside its fair comparison set

1,339Raw benchmark valueCI 1,328 - 1,350

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 60.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: expert. Source rank: #133. Votes: 620. Organization: alibaba. License: Apache 2.0.

60.7% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,344 - 1,392

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #113

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,361
Percentile: 64.8%
Last updated: recent
Eligibility: benchmark_derived_model

64.8% percentile inside its fair comparison set

1,361Raw benchmark valueCI 1,350 - 1,373

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #131

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,313
Percentile: 59.8%
Last updated: recent
Eligibility: benchmark_derived_model

59.8% percentile inside its fair comparison set

1,313Raw benchmark valueCI 1,302 - 1,325

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 58.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_legal_and_government. Source rank: #150. Votes: 869. Organization: alibaba. License: Apache 2.0.

58.7% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,344 - 1,383

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #113

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,382
Percentile: 65.3%
Last updated: recent
Eligibility: benchmark_derived_model

65.3% percentile inside its fair comparison set

1,382Raw benchmark valueCI 1,370 - 1,395

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,389
Percentile: 64.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_mathematical. Source rank: #129. Votes: 668. Organization: alibaba. License: Apache 2.0.

64.9% percentile inside its fair comparison set

1,389Raw benchmark valueCI 1,366 - 1,411

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #104

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,390
Percentile: 65.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_medicine_and_healthcare. Source rank: #125. Votes: 720. Organization: alibaba. License: Apache 2.0.

65.1% percentile inside its fair comparison set

1,390Raw benchmark valueCI 1,368 - 1,413

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #112

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 65.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: industry_software_and_it_services. Source rank: #135. Votes: 4837. Organization: alibaba. License: Apache 2.0.

65.8% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,385 - 1,402

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #130

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,328
Percentile: 60.2%
Last updated: recent
Eligibility: benchmark_derived_model

60.2% percentile inside its fair comparison set

1,328Raw benchmark valueCI 1,317 - 1,339

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #58

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 53.6%
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

47.2% percentile inside its fair comparison set

53.6%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #82

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 50.4%
Percentile: 25%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

25% percentile inside its fair comparison set

50.4%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #62

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 14.4%
Percentile: 43.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

43.5% percentile inside its fair comparison set

14.4%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #23

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 46.3%
Percentile: 79.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

79.6% percentile inside its fair comparison set

46.3%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #27

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 100%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

100% percentile inside its fair comparison set

100%Raw benchmark value

Multilingual14 benchmarks51.6%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 63.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: chinese. Source rank: #132. Votes: 715. Organization: alibaba. License: Apache 2.0.

63.4% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,391 - 1,435

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #129

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,366
Percentile: 40.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: french. Source rank: #153. Votes: 195. Organization: alibaba. License: Apache 2.0.

40.7% percentile inside its fair comparison set

1,366Raw benchmark valueCI 1,324 - 1,408

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #106

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,367
Percentile: 55.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: german. Source rank: #128. Votes: 285. Organization: alibaba. License: Apache 2.0.

55.7% percentile inside its fair comparison set

1,367Raw benchmark valueCI 1,334 - 1,400

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #105

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,296
Percentile: 48.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: japanese. Source rank: #130. Votes: 180. Organization: alibaba. License: Apache 2.0.

48.8% percentile inside its fair comparison set

1,296Raw benchmark valueCI 1,251 - 1,342

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #109

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,319
Percentile: 48.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: korean. Source rank: #133. Votes: 332. Organization: alibaba. License: Apache 2.0.

48.1% percentile inside its fair comparison set

1,319Raw benchmark valueCI 1,286 - 1,353

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #131

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,353
Percentile: 55%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: russian. Source rank: #158. Votes: 750. Organization: alibaba. License: Apache 2.0.

55% percentile inside its fair comparison set

1,353Raw benchmark valueCI 1,332 - 1,373

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #127

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,351
Percentile: 41.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: spanish. Source rank: #155. Votes: 399. Organization: alibaba. License: Apache 2.0.

41.1% percentile inside its fair comparison set

1,351Raw benchmark valueCI 1,321 - 1,381

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #101

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 66.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: chinese. Source rank: #123. Votes: 715. Organization: alibaba. License: Apache 2.0.

66.1% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,395 - 1,439

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #121

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,356
Percentile: 44.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: french. Source rank: #144. Votes: 195. Organization: alibaba. License: Apache 2.0.

44.4% percentile inside its fair comparison set

1,356Raw benchmark valueCI 1,314 - 1,399

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #102

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,360
Percentile: 57.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: german. Source rank: #124. Votes: 285. Organization: alibaba. License: Apache 2.0.

57.4% percentile inside its fair comparison set

1,360Raw benchmark valueCI 1,327 - 1,394

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #106

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,282
Percentile: 48.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: japanese. Source rank: #128. Votes: 180. Organization: alibaba. License: Apache 2.0.

48.3% percentile inside its fair comparison set

1,282Raw benchmark valueCI 1,237 - 1,327

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #104

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,309
Percentile: 50.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: korean. Source rank: #124. Votes: 332. Organization: alibaba. License: Apache 2.0.

50.5% percentile inside its fair comparison set

1,309Raw benchmark valueCI 1,275 - 1,343

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #125

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,338
Percentile: 57.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: russian. Source rank: #152. Votes: 750. Organization: alibaba. License: Apache 2.0.

57.1% percentile inside its fair comparison set

1,338Raw benchmark valueCI 1,318 - 1,358

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #116

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,349
Percentile: 46.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `qwen3-next-80b-a3b-thinking`. Category: spanish. Source rank: #143. Votes: 399. Organization: alibaba. License: Apache 2.0.

46.3% percentile inside its fair comparison set

1,349Raw benchmark valueCI 1,319 - 1,379