Benchmarks · /benchmarks/livebench-reasoning

Reasoning

Name: Reasoning
Creator: LiveBench

Recent objective reasoning tasks from the current LiveBench website release.

Source · LiveBench
Version · livebench snapshot 2026-06-24
Scores · 109

Test details

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

LiveBench

metric

Score (%)

judge

Objective

direction

higher better

group id

livebench_reasoning_2026_01_08

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Theory-of-mind, logic, spatial, and navigation-heavy reasoning correctness.

✗ Misses

Human preference, tool use, and coding execution quality.

Why this countsIt is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt still misses product usability, latency, and whether the model stays correct in messy real workflows.

Leaderboard · this benchmark version

#1 · claude-opus-4-8-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

89.7%

#2 · claude-opus-4-8-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 99.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

89.7%

#3 · claude-opus-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

88.7%

#4 · claude-opus-4-7-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 97.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

87.7%

#5 · claude-fable-5-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 96.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

87.3%

#6 · claude-sonnet-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

86.4%

#7 · claude-fable-5-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

86.3%

#8 · GPT-5.5

LB · Jan 8, 2026

Source label: gpt-5.5-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

86.2%

#9 · GPT-5.4

LB · Jan 8, 2026

Source label: gpt-5.4-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 92.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

85.7%

#10 · claude-sonnet-4-6-thinking-auto-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 91.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84.8%

#11 · claude-opus-4-8-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 90.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84.5%

#12 · gpt-5.2-2025-12-11-medium

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 89.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84.2%

#13 · gemini-3.1-pro-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-pro-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84%

#14 · claude-opus-4-7-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.8%

#15 · gpt-5.1-codex-max-high

LB · Jan 8, 2026

Source label: gpt-5.1-codex-max-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 87%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.7%

#16 · claude-opus-4-8-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 86.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.5%

#17 · Qwen3.7 Max

LB · Jan 8, 2026

Source label: qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 85.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.3%

#18 · gpt-5.2-2025-12-11-high

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 84.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.2%

#19 · Kimi K2.7 Code

LB · Jan 8, 2026

Source label: kimi-k2.7-code

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82.8%

#20 · deepseek-v4-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 82.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82.7%

#21 · claude-opus-4-5-20251101-thinking-64k-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 81.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82.2%

#22 · Gemini 3.5 Flash

LB · Jan 8, 2026

Source label: gemini-3.5-flash-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 80.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82%

#23 · GPT-5

LB · Jan 8, 2026

Source label: gpt-5-pro-2025-10-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 79.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

81.7%

#24 · gpt-5.3-codex-high

LB · Jan 8, 2026

Source label: gpt-5.3-codex-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 78.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

80.2%

#25 · claude-opus-4-5-20251101-thinking-64k-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 77.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

80.1%

#26 · claude-opus-4-7-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 76.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

80%

#27 · kimi-k2.6-thinking

LB · Jan 8, 2026

Source label: kimi-k2.6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 75.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

79.4%

#28 · Grok 4

LB · Jan 8, 2026

Source label: grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 75%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

79.1%

#29 · gpt-5.1-2025-11-13-high

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 74.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

78.8%

#30 · GLM-5.2 (max)

LB · Jan 8, 2026

Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

78.6%

#31 · GPT-5.2

LB · Jan 8, 2026

Source label: gpt-5.2-codex

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 72.2%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.7%

#32 · claude-sonnet-4-5-20250929-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 71.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.6%

#33 · claude-sonnet-4-6-thinking-auto-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 70.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.4%

#34 · gemini-3-pro-preview-11-2025-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 69.4%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.4%

#35 · deepseek-v3.2-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 68.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.2%

#36 · grok-build-0.1

LB · Jan 8, 2026

Source label: grok-build-0.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 67.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

76.4%

#37 · kimi-k2.5-thinking

LB · Jan 8, 2026

Source label: kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 66.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

76%

#38 · qwen3.6-plus

LB · Jan 8, 2026

Source label: qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 65.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

75.8%

#39 · Grok 4.20

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 64.8%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

75.3%

#40 · claude-opus-4-7-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.8%

#41 · minimax-m2.7

LB · Jan 8, 2026

Source label: minimax-m2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.8%

#42 · gemini-3-flash-preview-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 62%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.5%

#43 · minimax-m3

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 61.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.5%

#44 · gpt-5.1-2025-11-13-medium

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 60.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74%

#45 · glm-5.1

LB · Jan 8, 2026

Source label: glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 59.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

72.5%

#46 · claude-4-1-opus-20250805-thinking-32k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 58.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

72.3%

#47 · GPT-5.3 Codex

LB · Jan 8, 2026

Source label: gpt-5.3-codex-xhigh

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 57.4%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

71.4%

#48 · gpt-5.2-2025-12-11-low

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 56.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

71.2%

#49 · Grok 4.3

LB · Jan 8, 2026

Source label: grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 55.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.8%

#50 · gemini-2.5-pro-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-pro-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 54.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.8%

#51 · gemini-3-pro-preview-11-2025-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.6%

#52 · deepseek-v4-flash

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 52.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.6%

#53 · claude-opus-4-5-20251101-thinking-64k-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 51.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.4%

#54 · Qwen3.6 27B (Reasoning)

LB · Jan 8, 2026

Source label: qwen3.6-27b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 50.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.3%

#55 · mimo-v2-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 50%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

69.7%

#56 · glm-5

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 49.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

69.1%

#57 · claude-4-sonnet-20250514-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 48.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

69%

#58 · GPT-5.1

LB · Jan 8, 2026

Source label: gpt-5.1-codex-mini

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

64.7%

#59 · deepseek-v3.2-exp-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 46.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

64.4%

#60 · Kimi K2 Thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 45.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

63.5%

#61 · gpt-5.3-instant

LB · Jan 8, 2026

Source label: gpt-5.3-instant

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 44.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

63.1%

#62 · qwen3.6-flash

LB · Jan 8, 2026

Source label: qwen3.6-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 43.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

62.9%

#63 · glm-4.6

LB · Jan 8, 2026

Source label: glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 42.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

62.1%

#64 · claude-haiku-4-5-20251001-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 41.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

61.7%

#65 · glm-4.7

LB · Jan 8, 2026

Source label: glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.7%

#66 · gemini-3.1-flash-lite-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-flash-lite-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 39.8%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.7%

#67 · gpt-5.1-2025-11-13-low

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 38.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.6%

#68 · gemma-4-31b-it

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 38%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.4%

#69 · qwen3-235b-a22b-thinking-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 37%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.4%

#70 · minimax-m2.5

LB · Jan 8, 2026

Source label: minimax-m2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 36.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.3%

#71 · qwen3-235b-a22b-instruct-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 35.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

58.4%

#72 · qwen3-next-80b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

58.2%

#73 · glm-5v-turbo

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 33.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

56.1%

#74 · claude-opus-4-5-20251101-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 32.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

54.9%

#75 · qwen3-next-80b-a3b-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 31.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

54.7%

#76 · claude-opus-4-5-20251101-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 30.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

53.2%

#77 · gemini-2.5-flash-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 29.6%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

51.5%

#78 · gemini-3-flash-preview-minimal

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 28.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

49.2%

#79 · qwen3-32b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 27.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

48.3%

#80 · claude-opus-4-5-20251101-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 26.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

47.8%

#81 · gpt-5-mini-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

45.9%

#82 · DeepSeek V3.2 Exp

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

45.5%

#83 · gemini-2.5-flash-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 24.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

44.6%

#84 · DeepSeek Chat

LB · Jan 8, 2026

Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 23.1%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

44.3%

#85 · gemini-2.5-flash-lite-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 22.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

43.3%

#86 · gpt-5.2-2025-12-11-nothinking

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 21.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.8%

#87 · Grok Code Fast

LB · Jan 8, 2026

Source label: grok-code-fast-1-0825

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 20.4%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.3%

#88 · Claude Sonnet 4.5

LB · Jan 8, 2026

Source label: claude-sonnet-4-5-20250929

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 19.4%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.3%

#89 · kimi-k2-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 18.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.2%

#90 · gpt-5.4-nano-low

LB · Jan 8, 2026

Source label: gpt-5.4-nano-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 17.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.1%

#91 · claude-4-1-opus-20250805-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 16.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

40.9%

#92 · gpt-5.4-mini-low

LB · Jan 8, 2026

Source label: gpt-5.4-mini-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

40.3%

#93 · Elephant Alpha

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 14.8%
Last updated: archived
Eligibility: Alpha model tracked from BridgeBench but excluded from default rankings.
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

40%

#94 · claude-4-sonnet-20250514-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

39.7%

#95 · GPT-OSS 120B

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

39.2%

#96 · nemotron-3-ultra-550b-a55b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 12%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

37.5%

#97 · glm-4.6v

LB · Jan 8, 2026

Source label: glm-4.6v

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 11.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

37.2%

#98 · qwen3-30b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 10.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

36.7%

#99 · gemini-2.5-flash-lite-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 9.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

36.2%

#100 · nemotron-3-super-120b-a12b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 3.

34.4%

#101 · Claude Haiku 4.5

LB · Jan 8, 2026

Source label: claude-haiku-4-5-20251001

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 7.4%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

33.9%

#102 · devstral-2512

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 6.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

27.7%

#103 · gpt-5-nano-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 5.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

27.7%

#104 · gpt-5.1-2025-11-13-nothinking

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 4.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

26.8%

#105 · grok-4.20-beta-0309-non-reasoning

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-non-reasoning

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 3.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

25.6%

#106 · Grok 4.1 Fast

LB · Jan 8, 2026

Source label: grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

23.3%

#107 · GPT-5.4 mini

LB · Jan 8, 2026

Source label: gpt-5.4-mini

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

21.9%

#108 · arcee-trinity-large-preview

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

20.6%

#109 · GPT-5.4 nano

LB · Jan 8, 2026

Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

17.4%

Benchmarks · /benchmarks/livebench-reasoning

Reasoning

Recent objective reasoning tasks from the current LiveBench website release.

Source · LiveBench
Version · livebench snapshot 2026-06-24
Scores · 109

Test details

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

LiveBench

metric

Score (%)

judge

Objective

direction

higher better

group id

livebench_reasoning_2026_01_08

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Theory-of-mind, logic, spatial, and navigation-heavy reasoning correctness.

✗ Misses

Human preference, tool use, and coding execution quality.

Leaderboard · this benchmark version

#1 · claude-opus-4-8-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

89.7%

#2 · claude-opus-4-8-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 99.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

89.7%

#3 · claude-opus-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

88.7%

#4 · claude-opus-4-7-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 97.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

87.7%

#5 · claude-fable-5-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 96.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

87.3%

#6 · claude-sonnet-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

86.4%

#7 · claude-fable-5-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

86.3%

#8 · GPT-5.5

LB · Jan 8, 2026

Source label: gpt-5.5-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

86.2%

#9 · GPT-5.4

LB · Jan 8, 2026

Source label: gpt-5.4-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 92.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

85.7%

#10 · claude-sonnet-4-6-thinking-auto-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 91.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84.8%

#11 · claude-opus-4-8-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 90.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84.5%

#12 · gpt-5.2-2025-12-11-medium

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 89.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84.2%

#13 · gemini-3.1-pro-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-pro-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

84%

#14 · claude-opus-4-7-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.8%

#15 · gpt-5.1-codex-max-high

LB · Jan 8, 2026

Source label: gpt-5.1-codex-max-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 87%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.7%

#16 · claude-opus-4-8-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 86.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.5%

#17 · Qwen3.7 Max

LB · Jan 8, 2026

Source label: qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 85.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.3%

#18 · gpt-5.2-2025-12-11-high

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 84.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

83.2%

#19 · Kimi K2.7 Code

LB · Jan 8, 2026

Source label: kimi-k2.7-code

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82.8%

#20 · deepseek-v4-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 82.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82.7%

#21 · claude-opus-4-5-20251101-thinking-64k-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 81.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82.2%

#22 · Gemini 3.5 Flash

LB · Jan 8, 2026

Source label: gemini-3.5-flash-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 80.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

82%

#23 · GPT-5

LB · Jan 8, 2026

Source label: gpt-5-pro-2025-10-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 79.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

81.7%

#24 · gpt-5.3-codex-high

LB · Jan 8, 2026

Source label: gpt-5.3-codex-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 78.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

80.2%

#25 · claude-opus-4-5-20251101-thinking-64k-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 77.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

80.1%

#26 · claude-opus-4-7-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 76.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

80%

#27 · kimi-k2.6-thinking

LB · Jan 8, 2026

Source label: kimi-k2.6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 75.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

79.4%

#28 · Grok 4

LB · Jan 8, 2026

Source label: grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 75%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

79.1%

#29 · gpt-5.1-2025-11-13-high

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 74.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

78.8%

#30 · GLM-5.2 (max)

LB · Jan 8, 2026

Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

78.6%

#31 · GPT-5.2

LB · Jan 8, 2026

Source label: gpt-5.2-codex

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 72.2%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.7%

#32 · claude-sonnet-4-5-20250929-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 71.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.6%

#33 · claude-sonnet-4-6-thinking-auto-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 70.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.4%

#34 · gemini-3-pro-preview-11-2025-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 69.4%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.4%

#35 · deepseek-v3.2-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 68.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

77.2%

#36 · grok-build-0.1

LB · Jan 8, 2026

Source label: grok-build-0.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 67.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

76.4%

#37 · kimi-k2.5-thinking

LB · Jan 8, 2026

Source label: kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 66.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

76%

#38 · qwen3.6-plus

LB · Jan 8, 2026

Source label: qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 65.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

75.8%

#39 · Grok 4.20

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 64.8%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

75.3%

#40 · claude-opus-4-7-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.8%

#41 · minimax-m2.7

LB · Jan 8, 2026

Source label: minimax-m2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.8%

#42 · gemini-3-flash-preview-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 62%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.5%

#43 · minimax-m3

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 61.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74.5%

#44 · gpt-5.1-2025-11-13-medium

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 60.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

74%

#45 · glm-5.1

LB · Jan 8, 2026

Source label: glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 59.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

72.5%

#46 · claude-4-1-opus-20250805-thinking-32k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 58.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

72.3%

#47 · GPT-5.3 Codex

LB · Jan 8, 2026

Source label: gpt-5.3-codex-xhigh

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 57.4%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

71.4%

#48 · gpt-5.2-2025-12-11-low

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 56.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

71.2%

#49 · Grok 4.3

LB · Jan 8, 2026

Source label: grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 55.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.8%

#50 · gemini-2.5-pro-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-pro-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 54.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.8%

#51 · gemini-3-pro-preview-11-2025-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.6%

#52 · deepseek-v4-flash

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 52.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.6%

#53 · claude-opus-4-5-20251101-thinking-64k-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 51.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.4%

#54 · Qwen3.6 27B (Reasoning)

LB · Jan 8, 2026

Source label: qwen3.6-27b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 50.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

70.3%

#55 · mimo-v2-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 50%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

69.7%

#56 · glm-5

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 49.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

69.1%

#57 · claude-4-sonnet-20250514-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 48.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

69%

#58 · GPT-5.1

LB · Jan 8, 2026

Source label: gpt-5.1-codex-mini

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

64.7%

#59 · deepseek-v3.2-exp-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 46.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

64.4%

#60 · Kimi K2 Thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 45.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

63.5%

#61 · gpt-5.3-instant

LB · Jan 8, 2026

Source label: gpt-5.3-instant

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 44.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

63.1%

#62 · qwen3.6-flash

LB · Jan 8, 2026

Source label: qwen3.6-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 43.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

62.9%

#63 · glm-4.6

LB · Jan 8, 2026

Source label: glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 42.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

62.1%

#64 · claude-haiku-4-5-20251001-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 41.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

61.7%

#65 · glm-4.7

LB · Jan 8, 2026

Source label: glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.7%

#66 · gemini-3.1-flash-lite-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-flash-lite-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 39.8%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.7%

#67 · gpt-5.1-2025-11-13-low

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 38.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.6%

#68 · gemma-4-31b-it

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 38%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.4%

#69 · qwen3-235b-a22b-thinking-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 37%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.4%

#70 · minimax-m2.5

LB · Jan 8, 2026

Source label: minimax-m2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 36.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

59.3%

#71 · qwen3-235b-a22b-instruct-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 35.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

58.4%

#72 · qwen3-next-80b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

58.2%

#73 · glm-5v-turbo

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 33.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

56.1%

#74 · claude-opus-4-5-20251101-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 32.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

54.9%

#75 · qwen3-next-80b-a3b-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 31.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

54.7%

#76 · claude-opus-4-5-20251101-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 30.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

53.2%

#77 · gemini-2.5-flash-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 29.6%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

51.5%

#78 · gemini-3-flash-preview-minimal

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 28.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

49.2%

#79 · qwen3-32b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 27.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

48.3%

#80 · claude-opus-4-5-20251101-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 26.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

47.8%

#81 · gpt-5-mini-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

45.9%

#82 · DeepSeek V3.2 Exp

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

45.5%

#83 · gemini-2.5-flash-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 24.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

44.6%

#84 · DeepSeek Chat

LB · Jan 8, 2026

Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 23.1%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

44.3%

#85 · gemini-2.5-flash-lite-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 22.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

43.3%

#86 · gpt-5.2-2025-12-11-nothinking

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 21.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.8%

#87 · Grok Code Fast

LB · Jan 8, 2026

Source label: grok-code-fast-1-0825

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 20.4%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.3%

#88 · Claude Sonnet 4.5

LB · Jan 8, 2026

Source label: claude-sonnet-4-5-20250929

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 19.4%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.3%

#89 · kimi-k2-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 18.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.2%

#90 · gpt-5.4-nano-low

LB · Jan 8, 2026

Source label: gpt-5.4-nano-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 17.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

42.1%

#91 · claude-4-1-opus-20250805-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 16.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

40.9%

#92 · gpt-5.4-mini-low

LB · Jan 8, 2026

Source label: gpt-5.4-mini-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

40.3%

#93 · Elephant Alpha

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 14.8%
Last updated: archived
Eligibility: Alpha model tracked from BridgeBench but excluded from default rankings.
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

40%

#94 · claude-4-sonnet-20250514-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

39.7%

#95 · GPT-OSS 120B

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

39.2%

#96 · nemotron-3-ultra-550b-a55b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 12%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

37.5%

#97 · glm-4.6v

LB · Jan 8, 2026

Source label: glm-4.6v

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 11.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

37.2%

#98 · qwen3-30b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 10.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

36.7%

#99 · gemini-2.5-flash-lite-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 9.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

36.2%

#100 · nemotron-3-super-120b-a12b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 3.

34.4%

#101 · Claude Haiku 4.5

LB · Jan 8, 2026

Source label: claude-haiku-4-5-20251001

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 7.4%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

33.9%

#102 · devstral-2512

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 6.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

27.7%

#103 · gpt-5-nano-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 5.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

27.7%

#104 · gpt-5.1-2025-11-13-nothinking

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 4.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

26.8%

#105 · grok-4.20-beta-0309-non-reasoning

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-non-reasoning

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 3.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

25.6%

#106 · Grok 4.1 Fast

LB · Jan 8, 2026

Source label: grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

23.3%

#107 · GPT-5.4 mini

LB · Jan 8, 2026

Source label: gpt-5.4-mini

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

21.9%

#108 · arcee-trinity-large-preview

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

20.6%

#109 · GPT-5.4 nano

LB · Jan 8, 2026

Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

17.4%

Reasoning

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

Reasoning

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version