Benchmarks · /benchmarks/livebench-reasoning-logic-with-navigation

Logic with navigation

Name: Logic with navigation
Creator: LiveBench

Logic with navigation task slice from the current LiveBench website leaderboard.

Source · LiveBench
Version · livebench snapshot 2026-06-24
Scores · 109

Test details

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

LiveBench

metric

Score (%)

judge

Objective

direction

higher better

group id

livebench_reasoning_logic_with_navigation_2026_01_08

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Objective logic with navigation score in LiveBench.

✗ Misses

Adjacent LiveBench categories, subjective preference, latency, and cost.

Why this countsIt is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt still misses product usability, latency, and whether the model stays correct in messy real workflows.

Leaderboard · this benchmark version

#1 · Qwen3.7 Max

LB · Jan 8, 2026

Source label: qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

84%

#2 · claude-opus-4-8-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 99.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

82%

#3 · claude-opus-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

80%

#4 · claude-opus-4-8-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

80%

#5 · GLM-5.2 (max)

LB · Jan 8, 2026

Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

80%

#6 · claude-sonnet-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

76%

#7 · claude-opus-4-8-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

76%

#8 · claude-opus-4-5-20251101-thinking-64k-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#9 · Gemini 3.5 Flash

LB · Jan 8, 2026

Source label: gemini-3.5-flash-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#10 · gpt-5.2-2025-12-11-high

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#11 · Grok 4

LB · Jan 8, 2026

Source label: grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#12 · claude-fable-5-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#13 · Kimi K2.7 Code

LB · Jan 8, 2026

Source label: kimi-k2.7-code

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#14 · GPT-5.5

LB · Jan 8, 2026

Source label: gpt-5.5-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#15 · gemini-3-flash-preview-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#16 · gemini-3.1-pro-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-pro-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#17 · gpt-5.2-2025-12-11-medium

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#18 · claude-fable-5-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#19 · claude-4-1-opus-20250805-thinking-32k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#20 · claude-sonnet-4-6-thinking-auto-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#21 · claude-opus-4-7-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#22 · claude-opus-4-8-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#23 · deepseek-v3.2-exp-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#24 · gemini-3-pro-preview-11-2025-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#25 · gpt-5.3-codex-high

LB · Jan 8, 2026

Source label: gpt-5.3-codex-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#26 · Grok 4.20

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#27 · kimi-k2.6-thinking

LB · Jan 8, 2026

Source label: kimi-k2.6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#28 · qwen3.6-plus

LB · Jan 8, 2026

Source label: qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#29 · grok-build-0.1

LB · Jan 8, 2026

Source label: grok-build-0.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#30 · GPT-5.4

LB · Jan 8, 2026

Source label: gpt-5.4-xhigh

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#31 · claude-opus-4-5-20251101-thinking-64k-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#32 · claude-opus-4-5-20251101-thinking-64k-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#33 · claude-sonnet-4-5-20250929-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#34 · claude-sonnet-4-6-thinking-auto-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#35 · claude-opus-4-7-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#36 · gemini-2.5-pro-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-pro-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#37 · gemini-3-pro-preview-11-2025-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#38 · glm-5.1

LB · Jan 8, 2026

Source label: glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#39 · GPT-5.2

LB · Jan 8, 2026

Source label: gpt-5.2-codex

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#40 · GPT-5.3 Codex

LB · Jan 8, 2026

Source label: gpt-5.3-codex-xhigh

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#41 · claude-4-sonnet-20250514-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#42 · claude-opus-4-7-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#43 · claude-opus-4-7-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#44 · glm-5

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#45 · gpt-5.1-2025-11-13-high

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#46 · gpt-5.2-2025-12-11-low

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#47 · gpt-5.3-instant

LB · Jan 8, 2026

Source label: gpt-5.3-instant

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#48 · Grok 4.3

LB · Jan 8, 2026

Source label: grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#49 · kimi-k2.5-thinking

LB · Jan 8, 2026

Source label: kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#50 · minimax-m3

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#51 · deepseek-v3.2-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#52 · deepseek-v4-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#53 · deepseek-v4-flash

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#54 · gemma-4-31b-it

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#55 · GPT-5

LB · Jan 8, 2026

Source label: gpt-5-pro-2025-10-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#56 · gpt-5.1-codex-max-high

LB · Jan 8, 2026

Source label: gpt-5.1-codex-max-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#57 · mimo-v2-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#58 · claude-opus-4-5-20251101-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#59 · glm-4.7

LB · Jan 8, 2026

Source label: glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#60 · gpt-5.1-2025-11-13-medium

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#61 · minimax-m2.5

LB · Jan 8, 2026

Source label: minimax-m2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#62 · qwen3-235b-a22b-instruct-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#63 · Qwen3.6 27B (Reasoning)

LB · Jan 8, 2026

Source label: qwen3.6-27b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#64 · nemotron-3-super-120b-a12b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 41.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60.8%

#65 · gemini-2.5-flash-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#66 · gemini-3.1-flash-lite-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-flash-lite-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#67 · glm-4.6

LB · Jan 8, 2026

Source label: glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#68 · glm-5v-turbo

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#69 · minimax-m2.7

LB · Jan 8, 2026

Source label: minimax-m2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#70 · qwen3-235b-a22b-thinking-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#71 · qwen3.6-flash

LB · Jan 8, 2026

Source label: qwen3.6-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#72 · GPT-5.1

LB · Jan 8, 2026

Source label: gpt-5.1-codex-mini

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#73 · claude-opus-4-5-20251101-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#74 · gpt-5.1-2025-11-13-low

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#75 · Kimi K2 Thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#76 · qwen3-next-80b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#77 · claude-haiku-4-5-20251001-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 29.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

54%

#78 · gpt-5.2-2025-12-11-nothinking

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 29.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

54%

#79 · qwen3-next-80b-a3b-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 27.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

52%

#80 · claude-opus-4-5-20251101-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 26.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

50%

#81 · gemini-3-flash-preview-minimal

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

44%

#82 · claude-4-1-opus-20250805-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#83 · DeepSeek Chat

LB · Jan 8, 2026

Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#84 · gemini-2.5-flash-lite-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#85 · Grok Code Fast

LB · Jan 8, 2026

Source label: grok-code-fast-1-0825

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#86 · DeepSeek V3.2 Exp

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 21.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

40%

#87 · gpt-5-mini-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 21.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

40%

#88 · kimi-k2-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 19.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

34%

#89 · Claude Sonnet 4.5

LB · Jan 8, 2026

Source label: claude-sonnet-4-5-20250929

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 18.5%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

32%

#90 · gemini-2.5-flash-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 18.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

32%

#91 · qwen3-32b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 16.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

30%

#92 · claude-4-sonnet-20250514-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

28%

#93 · gpt-5.4-mini-low

LB · Jan 8, 2026

Source label: gpt-5.4-mini-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

28%

#94 · gpt-5.4-nano-low

LB · Jan 8, 2026

Source label: gpt-5.4-nano-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

28%

#95 · gpt-5.1-2025-11-13-nothinking

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

26%

#96 · nemotron-3-ultra-550b-a55b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

26%

#97 · gemini-2.5-flash-lite-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 11.1%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

24%

#98 · Elephant Alpha

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 10.2%
Last updated: archived
Eligibility: Alpha model tracked from BridgeBench but excluded from default rankings.
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

20%

#99 · Claude Haiku 4.5

LB · Jan 8, 2026

Source label: claude-haiku-4-5-20251001

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 9.3%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

18%

#100 · devstral-2512

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

16%

#101 · grok-4.20-beta-0309-non-reasoning

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-non-reasoning

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 8.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

16%

#102 · glm-4.6v

LB · Jan 8, 2026

Source label: glm-4.6v

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 6.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

12%

#103 · qwen3-30b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 5.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

10%

#104 · Grok 4.1 Fast

LB · Jan 8, 2026

Source label: grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 4.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#105 · GPT-OSS 120B

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 4.6%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#106 · GPT-5.4 mini

LB · Jan 8, 2026

Source label: gpt-5.4-mini

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#107 · arcee-trinity-large-preview

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 1.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#108 · GPT-5.4 nano

LB · Jan 8, 2026

Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#109 · gpt-5-nano-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

Benchmarks · /benchmarks/livebench-reasoning-logic-with-navigation

Logic with navigation

Logic with navigation task slice from the current LiveBench website leaderboard.

Source · LiveBench
Version · livebench snapshot 2026-06-24
Scores · 109

Test details

Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

LiveBench

metric

Score (%)

judge

Objective

direction

higher better

group id

livebench_reasoning_logic_with_navigation_2026_01_08

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Objective logic with navigation score in LiveBench.

✗ Misses

Adjacent LiveBench categories, subjective preference, latency, and cost.

Leaderboard · this benchmark version

#1 · Qwen3.7 Max

LB · Jan 8, 2026

Source label: qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

84%

#2 · claude-opus-4-8-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 99.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

82%

#3 · claude-opus-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

80%

#4 · claude-opus-4-8-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

80%

#5 · GLM-5.2 (max)

LB · Jan 8, 2026

Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

80%

#6 · claude-sonnet-4-6-thinking-auto-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

76%

#7 · claude-opus-4-8-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

76%

#8 · claude-opus-4-5-20251101-thinking-64k-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#9 · Gemini 3.5 Flash

LB · Jan 8, 2026

Source label: gemini-3.5-flash-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#10 · gpt-5.2-2025-12-11-high

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#11 · Grok 4

LB · Jan 8, 2026

Source label: grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#12 · claude-fable-5-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#13 · Kimi K2.7 Code

LB · Jan 8, 2026

Source label: kimi-k2.7-code

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 93.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

74%

#14 · GPT-5.5

LB · Jan 8, 2026

Source label: gpt-5.5-high

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#15 · gemini-3-flash-preview-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#16 · gemini-3.1-pro-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-pro-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#17 · gpt-5.2-2025-12-11-medium

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#18 · claude-fable-5-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 88%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

72%

#19 · claude-4-1-opus-20250805-thinking-32k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#20 · claude-sonnet-4-6-thinking-auto-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#21 · claude-opus-4-7-xhigh-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#22 · claude-opus-4-8-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#23 · deepseek-v3.2-exp-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#24 · gemini-3-pro-preview-11-2025-high

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#25 · gpt-5.3-codex-high

LB · Jan 8, 2026

Source label: gpt-5.3-codex-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#26 · Grok 4.20

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#27 · kimi-k2.6-thinking

LB · Jan 8, 2026

Source label: kimi-k2.6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#28 · qwen3.6-plus

LB · Jan 8, 2026

Source label: qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#29 · grok-build-0.1

LB · Jan 8, 2026

Source label: grok-build-0.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

70%

#30 · GPT-5.4

LB · Jan 8, 2026

Source label: gpt-5.4-xhigh

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#31 · claude-opus-4-5-20251101-thinking-64k-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#32 · claude-opus-4-5-20251101-thinking-64k-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#33 · claude-sonnet-4-5-20250929-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#34 · claude-sonnet-4-6-thinking-auto-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#35 · claude-opus-4-7-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#36 · gemini-2.5-pro-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-pro-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#37 · gemini-3-pro-preview-11-2025-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#38 · glm-5.1

LB · Jan 8, 2026

Source label: glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#39 · GPT-5.2

LB · Jan 8, 2026

Source label: gpt-5.2-codex

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#40 · GPT-5.3 Codex

LB · Jan 8, 2026

Source label: gpt-5.3-codex-xhigh

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 73.1%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

68%

#41 · claude-4-sonnet-20250514-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#42 · claude-opus-4-7-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#43 · claude-opus-4-7-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#44 · glm-5

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#45 · gpt-5.1-2025-11-13-high

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#46 · gpt-5.2-2025-12-11-low

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#47 · gpt-5.3-instant

LB · Jan 8, 2026

Source label: gpt-5.3-instant

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#48 · Grok 4.3

LB · Jan 8, 2026

Source label: grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#49 · kimi-k2.5-thinking

LB · Jan 8, 2026

Source label: kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#50 · minimax-m3

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 63%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

66%

#51 · deepseek-v3.2-thinking

LB · Jan 8, 2026

Source label: deepseek-v3.2-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#52 · deepseek-v4-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#53 · deepseek-v4-flash

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#54 · gemma-4-31b-it

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#55 · GPT-5

LB · Jan 8, 2026

Source label: gpt-5-pro-2025-10-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#56 · gpt-5.1-codex-max-high

LB · Jan 8, 2026

Source label: gpt-5.1-codex-max-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#57 · mimo-v2-pro

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 53.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

64%

#58 · claude-opus-4-5-20251101-high-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#59 · glm-4.7

LB · Jan 8, 2026

Source label: glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#60 · gpt-5.1-2025-11-13-medium

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-medium

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#61 · minimax-m2.5

LB · Jan 8, 2026

Source label: minimax-m2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#62 · qwen3-235b-a22b-instruct-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#63 · Qwen3.6 27B (Reasoning)

LB · Jan 8, 2026

Source label: qwen3.6-27b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 47.2%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

62%

#64 · nemotron-3-super-120b-a12b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 41.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60.8%

#65 · gemini-2.5-flash-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#66 · gemini-3.1-flash-lite-preview-high

LB · Jan 8, 2026

Source label: gemini-3.1-flash-lite-preview-high

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#67 · glm-4.6

LB · Jan 8, 2026

Source label: glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#68 · glm-5v-turbo

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#69 · minimax-m2.7

LB · Jan 8, 2026

Source label: minimax-m2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#70 · qwen3-235b-a22b-thinking-2507

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#71 · qwen3.6-flash

LB · Jan 8, 2026

Source label: qwen3.6-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 40.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

60%

#72 · GPT-5.1

LB · Jan 8, 2026

Source label: gpt-5.1-codex-mini

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#73 · claude-opus-4-5-20251101-medium-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#74 · gpt-5.1-2025-11-13-low

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#75 · Kimi K2 Thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#76 · qwen3-next-80b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 34.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

58%

#77 · claude-haiku-4-5-20251001-thinking-64k

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 29.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

54%

#78 · gpt-5.2-2025-12-11-nothinking

LB · Jan 8, 2026

Source label: gpt-5.2-2025-12-11-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 29.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

54%

#79 · qwen3-next-80b-a3b-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 27.8%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

52%

#80 · claude-opus-4-5-20251101-low-effort

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 26.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

50%

#81 · gemini-3-flash-preview-minimal

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

44%

#82 · claude-4-1-opus-20250805-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#83 · DeepSeek Chat

LB · Jan 8, 2026

Source label: deepseek-v3.2

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#84 · gemini-2.5-flash-lite-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#85 · Grok Code Fast

LB · Jan 8, 2026

Source label: grok-code-fast-1-0825

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 25%
Last updated: archived
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

42%

#86 · DeepSeek V3.2 Exp

LB · Jan 8, 2026

Source label: deepseek-v3.2-exp

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 21.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

40%

#87 · gpt-5-mini-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 21.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

40%

#88 · kimi-k2-instruct

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 19.4%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

34%

#89 · Claude Sonnet 4.5

LB · Jan 8, 2026

Source label: claude-sonnet-4-5-20250929

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 18.5%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

32%

#90 · gemini-2.5-flash-06-05-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-06-05-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 18.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

32%

#91 · qwen3-32b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 16.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

30%

#92 · claude-4-sonnet-20250514-base

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

28%

#93 · gpt-5.4-mini-low

LB · Jan 8, 2026

Source label: gpt-5.4-mini-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

28%

#94 · gpt-5.4-nano-low

LB · Jan 8, 2026

Source label: gpt-5.4-nano-low

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 15.7%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

28%

#95 · gpt-5.1-2025-11-13-nothinking

LB · Jan 8, 2026

Source label: gpt-5.1-2025-11-13-nothinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

26%

#96 · nemotron-3-ultra-550b-a55b

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 13%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

26%

#97 · gemini-2.5-flash-lite-preview-09-2025-highthinking

LB · Jan 8, 2026

Source label: gemini-2.5-flash-lite-preview-09-2025-highthinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 11.1%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

24%

#98 · Elephant Alpha

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 10.2%
Last updated: archived
Eligibility: Alpha model tracked from BridgeBench but excluded from default rankings.
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

20%

#99 · Claude Haiku 4.5

LB · Jan 8, 2026

Source label: claude-haiku-4-5-20251001

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 9.3%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

18%

#100 · devstral-2512

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 8.3%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

16%

#101 · grok-4.20-beta-0309-non-reasoning

LB · Jan 8, 2026

Source label: grok-4.20-beta-0309-non-reasoning

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 8.3%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

16%

#102 · glm-4.6v

LB · Jan 8, 2026

Source label: glm-4.6v

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 6.5%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

12%

#103 · qwen3-30b-a3b-thinking

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 5.6%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

10%

#104 · Grok 4.1 Fast

LB · Jan 8, 2026

Source label: grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 4.6%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#105 · GPT-OSS 120B

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 4.6%
Last updated: archived
Eligibility: historical_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#106 · GPT-5.4 mini

LB · Jan 8, 2026

Source label: gpt-5.4-mini

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#107 · arcee-trinity-large-preview

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 1.9%
Last updated: archived
Eligibility: preview_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#108 · GPT-5.4 nano

LB · Jan 8, 2026

Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

#109 · gpt-5-nano-low

LB · Jan 8, 2026

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://livebench.ai/table_2026_01_08.csv
Percentile: 0.9%
Last updated: archived
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

Logic with navigation

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

Logic with navigation

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version