Benchmarks · /benchmarks/vals-ai-mmlu_pro

MMLU Pro

Name: MMLU Pro
Creator: Vals AI

MMLU Pro result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 90

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_mmlu_pro_current

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Multiple-choice academic reasoning across broad subjects.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Why this countsIt is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt still misses product usability, latency, and whether the model stays correct in messy real workflows.

Leaderboard · this benchmark version

#1 · Claude Fable 5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

91.5%91% - 92%

#2 · Gemini 3.1 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 98.9%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

91%90.4% - 91.5%

#3 · Gemini 3 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 97.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

90.1%89.5% - 90.7%

#4 · Claude Opus 4.7

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 96.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

89.9%89.3% - 90.5%

#5 · Claude Opus 4.8

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-8

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 95.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

89.6%89% - 90.2%

#6 · Gemini 3.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 94.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

89.5%88.9% - 90.1%

#7 · Qwen3.7 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 93.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

89.3%88.7% - 89.9%

#8 · Claude Opus 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 92.1%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

89.1%88.2% - 90%

#9 · Gemini 3 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 91%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

88.6%88% - 89.2%

#10 · GPT-5.5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 89.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

88.1%87.5% - 88.8%

#11 · Claude Opus 4.1

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 88.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.9%87.3% - 88.6%

#12 · qwen3.6-plus

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 87.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

87.7%87% - 88.3%

#13 · kimi-k2.6

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 86.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Moonshot AI.

87.6%86.9% - 88.2%

#14 · GPT-5.4

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 85.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

87.5%86.7% - 88.3%

#15 · Claude Sonnet 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 84.3%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.4%86.6% - 88.1%

#16 · Claude Sonnet 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-6

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 83.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.3%86.5% - 88.2%

#17 · muse-spark

VALS-AI · Jun 17, 2026

Source label: meta/muse_spark

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 82%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Meta.

87.3%86.7% - 88%

#18 · Claude Opus 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-5-20251101-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 80.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.3%86.5% - 88%

#19 · deepseek-v4-pro

VALS-AI · Jun 17, 2026

Source label: deepseek/deepseek-v4-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 79.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: DeepSeek.

87.2%86.6% - 87.9%

#20 · Qwen3.5 Plus

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.5-plus-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 78.7%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

87.2%86.5% - 87.8%

#21 · MiniMax-M2.1

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 77.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

87%86.4% - 87.7%

#22 · glm-5.1

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 76.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

86.9%86.2% - 87.6%

#23 · GLM-5.2 (max)

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 75.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

86.7%86.1% - 87.4%

#24 · GPT-5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 74.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

86.5%85.9% - 87.2%

#25 · GPT-5.1

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 73%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

86.4%85.7% - 87%

#26 · Grok 4.20

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 71.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

86.3%85.6% - 86.9%

#27 · Gemini 3.1 Flash-Lite Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-flash-lite-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 70.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

86.2%85.6% - 86.9%

#28 · GPT-5.2

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 69.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

86.2%85.6% - 86.9%

#29 · Claude Opus 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-20250514

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 68.5%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

86.2%85.5% - 86.8%

#30 · glm-5

VALS-AI · Jun 17, 2026

Source label: zai/glm-5-thinking

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 67.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

86%85.4% - 86.7%

#31 · kimi-k2.5-thinking

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 66.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Moonshot AI.

85.9%85.2% - 86.6%

#32 · Grok 4.3

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

85.8%85.2% - 86.5%

#33 · Nemotron 3 Ultra 550B A55B (Reasoning)

VALS-AI · Jun 17, 2026

Source label: nvidia/nemotron-3-ultra-550b-a55b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 64%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Nvidia.

85.8%85.1% - 86.4%

#34 · o3

VALS-AI · Jun 17, 2026

Source label: openai/o3-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 62.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

85.6%84.9% - 86.3%

#35 · Grok 4

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 61.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

85.3%84.6% - 86%

#36 · Qwen3 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3-max-2026-01-23

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 60.7%
Last updated: recent
Eligibility: preview_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

85%84.3% - 85.7%

#37 · mimo-v2.5-pro

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 59.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Xiaomi.

84.6%83.7% - 85.5%

#38 · GPT-5.4 mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-mini-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 58.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

84.6%83.9% - 85.3%

#39 · minimax-m3

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 57.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

84.2%83.5% - 84.9%

#40 · Grok 4.1 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-1-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 56.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

84.2%83.5% - 84.9%

#41 · Qwen3.5 Flash

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.5-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 55.1%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

84.1%83.4% - 84.8%

#42 · Gemini 2.5 Pro

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-pro-exp-03-25

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 53.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

84.1%83.4% - 84.8%

#43 · Claude Sonnet 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 52.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

83.9%83.2% - 84.6%

#44 · Gemini 2.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 51.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

83.7%83% - 84.4%

#45 · o1

VALS-AI · Jun 17, 2026

Source label: openai/o1-2024-12-17

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 50.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

83.5%82.8% - 84.2%

#46 · DeepSeek Reasoner

VALS-AI · Jun 17, 2026

Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 49.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

83.2%82.3% - 84.1%

#47 · mimo-v2.5

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 48.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Xiaomi.

82.9%82.2% - 83.7%

#48 · glm-4.7

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 47.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

82.7%82% - 83.5%

#49 · Claude Sonnet 3.7

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-3-7-sonnet-20250219-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 46.1%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

82.7%82% - 83.5%

#50 · glm-4.6

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 44.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

82.2%81.5% - 82.9%

#51 · Grok 3 mini

VALS-AI · Jun 17, 2026

Source label: grok/grok-3-mini-fast-high-reasoning

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 43.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

81.4%80.6% - 82.1%

#52 · Qwen3 235B A22B

VALS-AI · Jun 17, 2026

Source label: fireworks/qwen3-235b-a22b

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 42.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

81.2%80.5% - 82%

#53 · glm-4.5

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 41.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

81.2%80.5% - 82%

#54 · Kimi K2 Thinking

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 40.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Moonshot AI.

81.1%80.3% - 81.8%

#55 · o4 mini

VALS-AI · Jun 17, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 39.3%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

80.6%79.8% - 81.3%

#56 · GPT-4.1

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4.1-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 38.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

80.5%79.7% - 81.2%

#57 · minimax-m2.7

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 37.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

80.4%79.7% - 81.2%

#58 · minimax-m2.5

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 36%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

80.1%79.3% - 80.9%

#59 · Grok 3

VALS-AI · Jun 17, 2026

Source label: grok/grok-3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 34.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

79.9%79.2% - 80.7%

#60 · Mistral Large (Feb '24)

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-large-2512

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 33.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

79.8%78.9% - 80.7%

#61 · Grok 4 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 32.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

79.7%78.9% - 80.5%

#62 · deepseek-v3-0324

VALS-AI · Jun 17, 2026

Source label: fireworks/deepseek-v3-0324

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 31.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

79.5%78.7% - 80.3%

#63 · kimi-k2-instruct

VALS-AI · Jun 17, 2026

Source label: together/moonshotai/Kimi-K2-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 30.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Together AI.

79.4%78.6% - 80.2%

#64 · GPT-OSS 120B

VALS-AI · Jun 17, 2026

Source label: fireworks/gpt-oss-120b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 29.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

79.2%78.4% - 79.9%

#65 · Gemini 2.5 Flash-Lite

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-lite-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 28.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

79.1%78.3% - 79.9%

#66 · Claude Haiku 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 27%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

78.7%77.8% - 79.6%

#67 · o3 mini

VALS-AI · Jun 17, 2026

Source label: openai/o3-mini-2025-01-31

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 25.8%
Last updated: recent
Eligibility: historical_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

78.7%77.9% - 79.5%

#68 · Claude Sonnet 3.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-3-5-sonnet-20241022

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 24.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

78.4%77.6% - 79.2%

#69 · Gemini 2.0 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.0-flash-001

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 23.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

77.4%76.6% - 78.2%

#70 · GPT-4.1 mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4.1-mini-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 22.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

77.2%76.4% - 78%

#71 · GPT-5.4 nano

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 21.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

77.2%76.3% - 78%

#72 · Grok 2

VALS-AI · Jun 17, 2026

Source label: grok/grok-2-1212

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 20.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

75.5%74.7% - 76.3%

#73 · mistral-medium-3.5

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-medium-3.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 19.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

75.3%74.5% - 76.2%

#74 · Gemini 1.5 Pro

VALS-AI · Jun 17, 2026

Source label: google/gemini-1.5-pro-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 18%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

75.3%74.5% - 76.1%

#75 · mistral-medium-2505

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-medium-2505

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 16.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

75.3%74.5% - 76.1%

#76 · GPT-4o

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4o-2024-08-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 15.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

74.1%73.3% - 75%

#77 · deepseek-v3

VALS-AI · Jun 17, 2026

Source label: fireworks/deepseek-v3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 14.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

73.8%73% - 74.7%

#78 · GPT-OSS 20B

VALS-AI · Jun 17, 2026

Source label: fireworks/gpt-oss-20b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 13.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

71.6%70.8% - 72.5%

#79 · mistral-large-2411

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-large-2411

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 12.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

69.7%68.8% - 70.6%

#80 · llama-4-scout-17b-16e-instruct

VALS-AI · Jun 17, 2026

Source label: together/meta-llama/Llama-4-Scout-17B-16E-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 11.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Together AI.

69.6%68.8% - 70.5%

#81 · command-a-03-2025

VALS-AI · Jun 17, 2026

Source label: cohere/command-a-03-2025

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 10.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Cohere.

69.2%68.3% - 70.1%

#82 · Magistral Medium 1.2

VALS-AI · Jun 17, 2026

Source label: mistralai/magistral-medium-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

68.7%67.8% - 69.5%

#83 · Mistral Small (Sep '24)

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-small-2503

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 7.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

66%65.1% - 66.9%

#84 · Gemini 1.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-1.5-flash-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 6.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

65.6%64.7% - 66.5%

#85 · Mistral Small (Feb '24)

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-small-2402

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 5.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

64.4%63.5% - 65.3%

#86 · Claude Haiku 3.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-3-5-haiku-20241022

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 4.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

64.1%63.2% - 65%

#87 · GPT-4.1 nano

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4.1-nano-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 3.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

63.5%62.6% - 64.4%

#88 · GPT-4o mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4o-mini-2024-07-18

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 2.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

62.7%61.8% - 63.6%

#89 · Magistral Small 1.2

VALS-AI · Jun 17, 2026

Source label: mistralai/magistral-small-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 1.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

62.1%61.2% - 63%

#90 · command-r-plus

VALS-AI · Jun 17, 2026

Source label: cohere/command-r-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 0%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Cohere.

44%43.1% - 44.9%

Benchmarks · /benchmarks/vals-ai-mmlu_pro

MMLU Pro

MMLU Pro result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 90

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_mmlu_pro_current

domain

Reasoning / math / science

What it measures vs what it misses

✓ Measures

Multiple-choice academic reasoning across broad subjects.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Leaderboard · this benchmark version

#1 · Claude Fable 5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

91.5%91% - 92%

#2 · Gemini 3.1 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 98.9%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

91%90.4% - 91.5%

#3 · Gemini 3 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 97.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

90.1%89.5% - 90.7%

#4 · Claude Opus 4.7

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 96.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

89.9%89.3% - 90.5%

#5 · Claude Opus 4.8

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-8

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 95.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

89.6%89% - 90.2%

#6 · Gemini 3.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 94.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

89.5%88.9% - 90.1%

#7 · Qwen3.7 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 93.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

89.3%88.7% - 89.9%

#8 · Claude Opus 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 92.1%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

89.1%88.2% - 90%

#9 · Gemini 3 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 91%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

88.6%88% - 89.2%

#10 · GPT-5.5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 89.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

88.1%87.5% - 88.8%

#11 · Claude Opus 4.1

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 88.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.9%87.3% - 88.6%

#12 · qwen3.6-plus

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 87.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

87.7%87% - 88.3%

#13 · kimi-k2.6

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 86.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Moonshot AI.

87.6%86.9% - 88.2%

#14 · GPT-5.4

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 85.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

87.5%86.7% - 88.3%

#15 · Claude Sonnet 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 84.3%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.4%86.6% - 88.1%

#16 · Claude Sonnet 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-6

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 83.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.3%86.5% - 88.2%

#17 · muse-spark

VALS-AI · Jun 17, 2026

Source label: meta/muse_spark

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 82%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Meta.

87.3%86.7% - 88%

#18 · Claude Opus 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-5-20251101-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 80.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

87.3%86.5% - 88%

#19 · deepseek-v4-pro

VALS-AI · Jun 17, 2026

Source label: deepseek/deepseek-v4-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 79.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: DeepSeek.

87.2%86.6% - 87.9%

#20 · Qwen3.5 Plus

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.5-plus-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 78.7%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

87.2%86.5% - 87.8%

#21 · MiniMax-M2.1

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 77.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

87%86.4% - 87.7%

#22 · glm-5.1

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 76.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

86.9%86.2% - 87.6%

#23 · GLM-5.2 (max)

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 75.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

86.7%86.1% - 87.4%

#24 · GPT-5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 74.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

86.5%85.9% - 87.2%

#25 · GPT-5.1

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 73%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

86.4%85.7% - 87%

#26 · Grok 4.20

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 71.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

86.3%85.6% - 86.9%

#27 · Gemini 3.1 Flash-Lite Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-flash-lite-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 70.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

86.2%85.6% - 86.9%

#28 · GPT-5.2

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 69.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

86.2%85.6% - 86.9%

#29 · Claude Opus 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-20250514

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 68.5%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

86.2%85.5% - 86.8%

#30 · glm-5

VALS-AI · Jun 17, 2026

Source label: zai/glm-5-thinking

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 67.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

86%85.4% - 86.7%

#31 · kimi-k2.5-thinking

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 66.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Moonshot AI.

85.9%85.2% - 86.6%

#32 · Grok 4.3

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

85.8%85.2% - 86.5%

#33 · Nemotron 3 Ultra 550B A55B (Reasoning)

VALS-AI · Jun 17, 2026

Source label: nvidia/nemotron-3-ultra-550b-a55b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 64%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Nvidia.

85.8%85.1% - 86.4%

#34 · o3

VALS-AI · Jun 17, 2026

Source label: openai/o3-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 62.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

85.6%84.9% - 86.3%

#35 · Grok 4

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 61.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

85.3%84.6% - 86%

#36 · Qwen3 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3-max-2026-01-23

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 60.7%
Last updated: recent
Eligibility: preview_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

85%84.3% - 85.7%

#37 · mimo-v2.5-pro

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 59.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Xiaomi.

84.6%83.7% - 85.5%

#38 · GPT-5.4 mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-mini-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 58.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

84.6%83.9% - 85.3%

#39 · minimax-m3

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 57.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

84.2%83.5% - 84.9%

#40 · Grok 4.1 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-1-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 56.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

84.2%83.5% - 84.9%

#41 · Qwen3.5 Flash

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.5-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 55.1%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Alibaba.

84.1%83.4% - 84.8%

#42 · Gemini 2.5 Pro

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-pro-exp-03-25

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 53.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

84.1%83.4% - 84.8%

#43 · Claude Sonnet 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 52.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

83.9%83.2% - 84.6%

#44 · Gemini 2.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 51.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

83.7%83% - 84.4%

#45 · o1

VALS-AI · Jun 17, 2026

Source label: openai/o1-2024-12-17

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 50.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

83.5%82.8% - 84.2%

#46 · DeepSeek Reasoner

VALS-AI · Jun 17, 2026

Source label: fireworks/deepseek-r1

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 49.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

83.2%82.3% - 84.1%

#47 · mimo-v2.5

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 48.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Xiaomi.

82.9%82.2% - 83.7%

#48 · glm-4.7

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 47.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

82.7%82% - 83.5%

#49 · Claude Sonnet 3.7

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-3-7-sonnet-20250219-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 46.1%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

82.7%82% - 83.5%

#50 · glm-4.6

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 44.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

82.2%81.5% - 82.9%

#51 · Grok 3 mini

VALS-AI · Jun 17, 2026

Source label: grok/grok-3-mini-fast-high-reasoning

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 43.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

81.4%80.6% - 82.1%

#52 · Qwen3 235B A22B

VALS-AI · Jun 17, 2026

Source label: fireworks/qwen3-235b-a22b

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 42.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

81.2%80.5% - 82%

#53 · glm-4.5

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 41.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

81.2%80.5% - 82%

#54 · Kimi K2 Thinking

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 40.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Moonshot AI.

81.1%80.3% - 81.8%

#55 · o4 mini

VALS-AI · Jun 17, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 39.3%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

80.6%79.8% - 81.3%

#56 · GPT-4.1

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4.1-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 38.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

80.5%79.7% - 81.2%

#57 · minimax-m2.7

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 37.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

80.4%79.7% - 81.2%

#58 · minimax-m2.5

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 36%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: MiniMax.

80.1%79.3% - 80.9%

#59 · Grok 3

VALS-AI · Jun 17, 2026

Source label: grok/grok-3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 34.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

79.9%79.2% - 80.7%

#60 · Mistral Large (Feb '24)

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-large-2512

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 33.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

79.8%78.9% - 80.7%

#61 · Grok 4 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 32.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

79.7%78.9% - 80.5%

#62 · deepseek-v3-0324

VALS-AI · Jun 17, 2026

Source label: fireworks/deepseek-v3-0324

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 31.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

79.5%78.7% - 80.3%

#63 · kimi-k2-instruct

VALS-AI · Jun 17, 2026

Source label: together/moonshotai/Kimi-K2-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 30.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Together AI.

79.4%78.6% - 80.2%

#64 · GPT-OSS 120B

VALS-AI · Jun 17, 2026

Source label: fireworks/gpt-oss-120b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 29.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

79.2%78.4% - 79.9%

#65 · Gemini 2.5 Flash-Lite

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-lite-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 28.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

79.1%78.3% - 79.9%

#66 · Claude Haiku 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 27%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

78.7%77.8% - 79.6%

#67 · o3 mini

VALS-AI · Jun 17, 2026

Source label: openai/o3-mini-2025-01-31

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 25.8%
Last updated: recent
Eligibility: historical_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

78.7%77.9% - 79.5%

#68 · Claude Sonnet 3.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-3-5-sonnet-20241022

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 24.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

78.4%77.6% - 79.2%

#69 · Gemini 2.0 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.0-flash-001

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 23.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

77.4%76.6% - 78.2%

#70 · GPT-4.1 mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4.1-mini-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 22.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

77.2%76.4% - 78%

#71 · GPT-5.4 nano

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 21.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

77.2%76.3% - 78%

#72 · Grok 2

VALS-AI · Jun 17, 2026

Source label: grok/grok-2-1212

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 20.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: xAI.

75.5%74.7% - 76.3%

#73 · mistral-medium-3.5

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-medium-3.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 19.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

75.3%74.5% - 76.2%

#74 · Gemini 1.5 Pro

VALS-AI · Jun 17, 2026

Source label: google/gemini-1.5-pro-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 18%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

75.3%74.5% - 76.1%

#75 · mistral-medium-2505

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-medium-2505

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 16.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

75.3%74.5% - 76.1%

#76 · GPT-4o

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4o-2024-08-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 15.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

74.1%73.3% - 75%

#77 · deepseek-v3

VALS-AI · Jun 17, 2026

Source label: fireworks/deepseek-v3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 14.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

73.8%73% - 74.7%

#78 · GPT-OSS 20B

VALS-AI · Jun 17, 2026

Source label: fireworks/gpt-oss-20b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 13.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Fireworks AI.

71.6%70.8% - 72.5%

#79 · mistral-large-2411

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-large-2411

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 12.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

69.7%68.8% - 70.6%

#80 · llama-4-scout-17b-16e-instruct

VALS-AI · Jun 17, 2026

Source label: together/meta-llama/Llama-4-Scout-17B-16E-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 11.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Together AI.

69.6%68.8% - 70.5%

#81 · command-a-03-2025

VALS-AI · Jun 17, 2026

Source label: cohere/command-a-03-2025

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 10.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Cohere.

69.2%68.3% - 70.1%

#82 · Magistral Medium 1.2

VALS-AI · Jun 17, 2026

Source label: mistralai/magistral-medium-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

68.7%67.8% - 69.5%

#83 · Mistral Small (Sep '24)

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-small-2503

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 7.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

66%65.1% - 66.9%

#84 · Gemini 1.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-1.5-flash-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 6.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

65.6%64.7% - 66.5%

#85 · Mistral Small (Feb '24)

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-small-2402

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 5.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

64.4%63.5% - 65.3%

#86 · Claude Haiku 3.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-3-5-haiku-20241022

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 4.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Anthropic.

64.1%63.2% - 65%

#87 · GPT-4.1 nano

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4.1-nano-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 3.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

63.5%62.6% - 64.4%

#88 · GPT-4o mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-4o-mini-2024-07-18

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 2.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

62.7%61.8% - 63.6%

#89 · Magistral Small 1.2

VALS-AI · Jun 17, 2026

Source label: mistralai/magistral-small-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 1.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Mistral AI.

62.1%61.2% - 63%

#90 · command-r-plus

VALS-AI · Jun 17, 2026

Source label: cohere/command-r-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmlu_pro
Percentile: 0%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Cohere.

44%43.1% - 44.9%

MMLU Pro

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

MMLU Pro

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version