Benchmarks · /benchmarks/vals-ai-mmmu

MMMU Pro

Name: MMMU Pro
Creator: Vals AI

MMMU Pro result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 59

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_mmmu_current

domain

Vision understanding

What it measures vs what it misses

✓ Measures

Multimodal reasoning over images and prompts.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Why this countsIt is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not tell you whether the model can generate or edit images well.

Leaderboard · this benchmark version

#1 · Claude Fable 5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

89.3%87.8% - 90.8%

#2 · Gemini 3.5 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-3.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 98.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

88.3%86.7% - 89.8%

#3 · GPT-5.5

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 98.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

88.3%86.7% - 89.8%

#4 · Gemini 3.1 Pro Preview

VALS-AI · Jun 10, 2026

Source label: google/gemini-3.1-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 94.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

88.2%86.7% - 89.7%

#5 · Gemini 3 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 93.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

87.6%86.1% - 89.2%

#6 · Gemini 3 Pro Preview

VALS-AI · Jun 10, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 91.4%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

87.5%86% - 89.1%

#7 · GPT-5.4

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 91.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

87.5%86% - 89.1%

#8 · muse-spark

VALS-AI · Jun 10, 2026

Source label: meta/muse_spark

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 87.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Meta.

87.4%85.8% - 89%

#9 · GPT-5.2

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 86.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

86.7%85.1% - 88.3%

#10 · Claude Opus 4.8

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-8

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 84.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

86.6%85% - 88.2%

#11 · kimi-k2.6

VALS-AI · Jun 10, 2026

Source label: kimi/kimi-k2.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 82.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Moonshot AI.

86.3%84.7% - 87.9%

#12 · Claude Opus 4.7

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 81%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

85.5%83.9% - 87.2%

#13 · kimi-k2.5-thinking

VALS-AI · Jun 10, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 79.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Moonshot AI.

84.3%82.6% - 86%

#14 · qwen3.6-plus

VALS-AI · Jun 10, 2026

Source label: alibaba/qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 77.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Alibaba.

84.2%82.4% - 85.9%

#15 · Claude Opus 4.6

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 75.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

83.9%82.1% - 85.6%

#16 · Claude Sonnet 4.6

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-sonnet-4-6

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 74.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

83.6%81.8% - 85.3%

#17 · Grok 4.20

VALS-AI · Jun 10, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 72.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

83.5%81.7% - 85.2%

#18 · GPT-5.1

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 70.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

83.2%81.4% - 84.9%

#19 · Grok 4.3

VALS-AI · Jun 10, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 69%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

83.1%81.3% - 84.8%

#20 · Claude Opus 4.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-5-20251101-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 67.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

82.9%81.2% - 84.7%

#21 · Gemini 3.1 Flash-Lite Preview

VALS-AI · Jun 10, 2026

Source label: google/gemini-3.1-flash-lite-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 65.5%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

82.5%80.7% - 84.3%

#22 · Qwen3.5 Flash

VALS-AI · Jun 10, 2026

Source label: alibaba/qwen3.5-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 63.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Alibaba.

81.9%80.1% - 83.7%

#23 · GPT-5

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 62.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

81.5%79.7% - 83.3%

#24 · Gemini 2.5 Pro

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.5-pro-exp-03-25

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 60.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

81.3%79.5% - 83.2%

#25 · minimax-m3

VALS-AI · Jun 10, 2026

Source label: minimax/MiniMax-M3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 58.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: MiniMax.

81.2%79.3% - 83%

#26 · Gemini 2.5 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 56.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

80.8%78.9% - 82.6%

#27 · o3

VALS-AI · Jun 10, 2026

Source label: openai/o3-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 55.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

80.4%78.5% - 82.3%

#28 · mimo-v2.5

VALS-AI · Jun 10, 2026

Source label: xiaomi/mimo-v2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 53.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Xiaomi.

80%78.1% - 81.9%

#29 · o4 mini

VALS-AI · Jun 10, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 51.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

79.7%77.8% - 81.6%

#30 · Claude Sonnet 4.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 50%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

79.3%77.4% - 81.2%

#31 · GPT-5.4 mini

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.4-mini-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 48.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

79.2%77.3% - 81.2%

#32 · Claude Opus 4.1

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 46.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

77.5%75.5% - 79.5%

#33 · o1

VALS-AI · Jun 10, 2026

Source label: openai/o1-2024-12-17

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 44.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

77.4%75.4% - 79.4%

#34 · Grok 4

VALS-AI · Jun 10, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 43.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

76.3%74.3% - 78.3%

#35 · Gemini 2.5 Flash-Lite

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.5-flash-lite-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 41.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

75.4%73.4% - 77.5%

#36 · Claude Sonnet 3.7

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-3-7-sonnet-20250219-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 39.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

75.1%73.1% - 77.1%

#37 · Claude Sonnet 4

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 37.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

74.9%72.9% - 77%

#38 · GPT-5.4 nano

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 36.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

73.6%71.5% - 75.7%

#39 · Claude Opus 4

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-20250514

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 34.5%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

73.3%71.2% - 75.4%

#40 · Grok 4 Fast

VALS-AI · Jun 10, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 32.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

72.8%70.7% - 74.9%

#41 · Grok 4.1 Fast

VALS-AI · Jun 10, 2026

Source label: grok/grok-4-1-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 31%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

72.7%70.6% - 74.8%

#42 · GPT-4.1

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4.1-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 29.3%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

72.4%70.3% - 74.5%

#43 · GPT-4.1 mini

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4.1-mini-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 27.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

70.5%68.4% - 72.7%

#44 · Gemini 2.0 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.0-flash-001

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 25.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

69.8%67.6% - 71.9%

#45 · Claude Sonnet 3.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-3-5-sonnet-20241022

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 24.1%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

68.8%66.6% - 71%

#46 · Mistral Large (Feb '24)

VALS-AI · Jun 10, 2026

Source label: mistralai/mistral-large-2512

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 22.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

66.2%64% - 68.4%

#47 · Gemini 1.5 Pro

VALS-AI · Jun 10, 2026

Source label: google/gemini-1.5-pro-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 20.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

65.5%63.3% - 67.7%

#48 · Magistral Small 1.2

VALS-AI · Jun 10, 2026

Source label: mistralai/magistral-small-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 19%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

65.2%63% - 67.4%

#49 · Magistral Medium 1.2

VALS-AI · Jun 10, 2026

Source label: mistralai/magistral-medium-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 17.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

64.6%62.3% - 66.8%

#50 · GPT-4o

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4o-2024-08-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 15.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

64%61.7% - 66.3%

#51 · mistral-medium-2505

VALS-AI · Jun 10, 2026

Source label: mistralai/mistral-medium-2505

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 13.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

63%60.7% - 65.2%

#52 · Mistral Small (Sep '24)

VALS-AI · Jun 10, 2026

Source label: mistralai/mistral-small-2503

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 12.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

60.1%57.8% - 62.4%

#53 · llama-4-scout-17b-16e-instruct

VALS-AI · Jun 10, 2026

Source label: together/meta-llama/Llama-4-Scout-17B-16E-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 10.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Together AI.

58.8%56.4% - 61.1%

#54 · Grok 2 Vision

VALS-AI · Jun 10, 2026

Source label: grok/grok-2-vision-1212

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 8.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

57.3%54.9% - 59.6%

#55 · Gemini 1.5 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-1.5-flash-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 6.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

57.2%54.9% - 59.5%

#56 · GPT-4o mini

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4o-mini-2024-07-18

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 5.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

56.6%54.2% - 58.9%

#57 · GPT-4.1 nano

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4.1-nano-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 3.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

55.1%52.7% - 57.4%

#58 · Claude Haiku 4.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 1.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

46.1%43.7% - 48.4%

#59 · Qwen3.5 Plus

VALS-AI · Jun 10, 2026

Source label: alibaba/qwen3.5-plus-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 0%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Alibaba.

22.8%20.8% - 24.8%

Benchmarks · /benchmarks/vals-ai-mmmu

MMMU Pro

MMMU Pro result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 59

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_mmmu_current

domain

Vision understanding

What it measures vs what it misses

✓ Measures

Multimodal reasoning over images and prompts.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Leaderboard · this benchmark version

#1 · Claude Fable 5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

89.3%87.8% - 90.8%

#2 · Gemini 3.5 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-3.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 98.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

88.3%86.7% - 89.8%

#3 · GPT-5.5

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 98.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

88.3%86.7% - 89.8%

#4 · Gemini 3.1 Pro Preview

VALS-AI · Jun 10, 2026

Source label: google/gemini-3.1-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 94.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

88.2%86.7% - 89.7%

#5 · Gemini 3 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 93.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

87.6%86.1% - 89.2%

#6 · Gemini 3 Pro Preview

VALS-AI · Jun 10, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 91.4%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

87.5%86% - 89.1%

#7 · GPT-5.4

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 91.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

87.5%86% - 89.1%

#8 · muse-spark

VALS-AI · Jun 10, 2026

Source label: meta/muse_spark

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 87.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Meta.

87.4%85.8% - 89%

#9 · GPT-5.2

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 86.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

86.7%85.1% - 88.3%

#10 · Claude Opus 4.8

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-8

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 84.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

86.6%85% - 88.2%

#11 · kimi-k2.6

VALS-AI · Jun 10, 2026

Source label: kimi/kimi-k2.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 82.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Moonshot AI.

86.3%84.7% - 87.9%

#12 · Claude Opus 4.7

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 81%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

85.5%83.9% - 87.2%

#13 · kimi-k2.5-thinking

VALS-AI · Jun 10, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 79.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Moonshot AI.

84.3%82.6% - 86%

#14 · qwen3.6-plus

VALS-AI · Jun 10, 2026

Source label: alibaba/qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 77.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Alibaba.

84.2%82.4% - 85.9%

#15 · Claude Opus 4.6

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 75.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

83.9%82.1% - 85.6%

#16 · Claude Sonnet 4.6

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-sonnet-4-6

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 74.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

83.6%81.8% - 85.3%

#17 · Grok 4.20

VALS-AI · Jun 10, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 72.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

83.5%81.7% - 85.2%

#18 · GPT-5.1

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 70.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

83.2%81.4% - 84.9%

#19 · Grok 4.3

VALS-AI · Jun 10, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 69%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

83.1%81.3% - 84.8%

#20 · Claude Opus 4.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-5-20251101-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 67.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

82.9%81.2% - 84.7%

#21 · Gemini 3.1 Flash-Lite Preview

VALS-AI · Jun 10, 2026

Source label: google/gemini-3.1-flash-lite-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 65.5%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

82.5%80.7% - 84.3%

#22 · Qwen3.5 Flash

VALS-AI · Jun 10, 2026

Source label: alibaba/qwen3.5-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 63.8%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Alibaba.

81.9%80.1% - 83.7%

#23 · GPT-5

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 62.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

81.5%79.7% - 83.3%

#24 · Gemini 2.5 Pro

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.5-pro-exp-03-25

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 60.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

81.3%79.5% - 83.2%

#25 · minimax-m3

VALS-AI · Jun 10, 2026

Source label: minimax/MiniMax-M3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 58.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: MiniMax.

81.2%79.3% - 83%

#26 · Gemini 2.5 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 56.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

80.8%78.9% - 82.6%

#27 · o3

VALS-AI · Jun 10, 2026

Source label: openai/o3-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 55.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

80.4%78.5% - 82.3%

#28 · mimo-v2.5

VALS-AI · Jun 10, 2026

Source label: xiaomi/mimo-v2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 53.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Xiaomi.

80%78.1% - 81.9%

#29 · o4 mini

VALS-AI · Jun 10, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 51.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

79.7%77.8% - 81.6%

#30 · Claude Sonnet 4.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 50%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

79.3%77.4% - 81.2%

#31 · GPT-5.4 mini

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.4-mini-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 48.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

79.2%77.3% - 81.2%

#32 · Claude Opus 4.1

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 46.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

77.5%75.5% - 79.5%

#33 · o1

VALS-AI · Jun 10, 2026

Source label: openai/o1-2024-12-17

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 44.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

77.4%75.4% - 79.4%

#34 · Grok 4

VALS-AI · Jun 10, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 43.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

76.3%74.3% - 78.3%

#35 · Gemini 2.5 Flash-Lite

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.5-flash-lite-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 41.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

75.4%73.4% - 77.5%

#36 · Claude Sonnet 3.7

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-3-7-sonnet-20250219-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 39.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

75.1%73.1% - 77.1%

#37 · Claude Sonnet 4

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 37.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

74.9%72.9% - 77%

#38 · GPT-5.4 nano

VALS-AI · Jun 10, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 36.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

73.6%71.5% - 75.7%

#39 · Claude Opus 4

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-opus-4-20250514

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 34.5%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

73.3%71.2% - 75.4%

#40 · Grok 4 Fast

VALS-AI · Jun 10, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 32.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

72.8%70.7% - 74.9%

#41 · Grok 4.1 Fast

VALS-AI · Jun 10, 2026

Source label: grok/grok-4-1-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 31%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

72.7%70.6% - 74.8%

#42 · GPT-4.1

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4.1-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 29.3%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

72.4%70.3% - 74.5%

#43 · GPT-4.1 mini

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4.1-mini-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 27.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

70.5%68.4% - 72.7%

#44 · Gemini 2.0 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-2.0-flash-001

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 25.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

69.8%67.6% - 71.9%

#45 · Claude Sonnet 3.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-3-5-sonnet-20241022

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 24.1%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

68.8%66.6% - 71%

#46 · Mistral Large (Feb '24)

VALS-AI · Jun 10, 2026

Source label: mistralai/mistral-large-2512

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 22.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

66.2%64% - 68.4%

#47 · Gemini 1.5 Pro

VALS-AI · Jun 10, 2026

Source label: google/gemini-1.5-pro-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 20.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

65.5%63.3% - 67.7%

#48 · Magistral Small 1.2

VALS-AI · Jun 10, 2026

Source label: mistralai/magistral-small-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 19%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

65.2%63% - 67.4%

#49 · Magistral Medium 1.2

VALS-AI · Jun 10, 2026

Source label: mistralai/magistral-medium-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 17.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

64.6%62.3% - 66.8%

#50 · GPT-4o

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4o-2024-08-06

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 15.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

64%61.7% - 66.3%

#51 · mistral-medium-2505

VALS-AI · Jun 10, 2026

Source label: mistralai/mistral-medium-2505

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 13.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

63%60.7% - 65.2%

#52 · Mistral Small (Sep '24)

VALS-AI · Jun 10, 2026

Source label: mistralai/mistral-small-2503

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 12.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Mistral AI.

60.1%57.8% - 62.4%

#53 · llama-4-scout-17b-16e-instruct

VALS-AI · Jun 10, 2026

Source label: together/meta-llama/Llama-4-Scout-17B-16E-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 10.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Together AI.

58.8%56.4% - 61.1%

#54 · Grok 2 Vision

VALS-AI · Jun 10, 2026

Source label: grok/grok-2-vision-1212

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 8.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: xAI.

57.3%54.9% - 59.6%

#55 · Gemini 1.5 Flash

VALS-AI · Jun 10, 2026

Source label: google/gemini-1.5-flash-002

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 6.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

57.2%54.9% - 59.5%

#56 · GPT-4o mini

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4o-mini-2024-07-18

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 5.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

56.6%54.2% - 58.9%

#57 · GPT-4.1 nano

VALS-AI · Jun 10, 2026

Source label: openai/gpt-4.1-nano-2025-04-14

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 3.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

55.1%52.7% - 57.4%

#58 · Claude Haiku 4.5

VALS-AI · Jun 10, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 1.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Anthropic.

46.1%43.7% - 48.4%

#59 · Qwen3.5 Plus

VALS-AI · Jun 10, 2026

Source label: alibaba/qwen3.5-plus-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/mmmu
Percentile: 0%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Alibaba.

22.8%20.8% - 24.8%

MMMU Pro

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

MMMU Pro

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version