Benchmarks · /benchmarks/vals-ai-ioi

IOI

Name: IOI
Creator: Vals AI

IOI result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 45

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_ioi_current

domain

Coding

What it measures vs what it misses

✓ Measures

International Olympiad in Informatics-style coding tasks.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.

Leaderboard · this benchmark version

#1 · Claude Fable 5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

72.3%53.9% - 90.6%

#2 · GPT-5.4

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 97.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

67.8%48.5% - 87.2%

#3 · GPT-5.2

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 95.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

54.8%38.5% - 71.1%

#4 · Claude Opus 4.7

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 93.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

47.1%30% - 64.1%

#5 · Qwen3.7 Max

VALS-AI · Jun 9, 2026

Source label: alibaba/qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 90.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Alibaba.

46.8%26.6% - 66.9%

#6 · GPT-5.3 Codex

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.3-codex

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 88.6%
Last updated: recent
Eligibility: specialized_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

43.8%28.5% - 59.1%

#7 · Gemini 3 Flash

VALS-AI · Jun 9, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 86.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

39.1%18.4% - 59.8%

#8 · Gemini 3 Pro Preview

VALS-AI · Jun 9, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 84.1%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

38.8%16.3% - 61.4%

#9 · deepseek-v4-pro

VALS-AI · Jun 9, 2026

Source label: deepseek/deepseek-v4-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 81.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: DeepSeek.

35.8%21.8% - 49.8%

#10 · Grok 4.20

VALS-AI · Jun 9, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 79.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

30.2%15.5% - 44.8%

#11 · Grok 4

VALS-AI · Jun 9, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 77.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

26.2%12.9% - 39.4%

#12 · Claude Opus 4.5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-5-20251101

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 75%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

23.6%8.8% - 38.4%

#13 · glm-5

VALS-AI · Jun 9, 2026

Source label: zai/glm-5-thinking

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 72.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

22%6.6% - 37.4%

#14 · GPT-5.1

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 70.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

21.5%7.1% - 35.9%

#15 · GPT-5

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 68.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

20%11% - 29%

#16 · Claude Sonnet 4.5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 65.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

18.3%6.7% - 29.9%

#17 · kimi-k2.5-thinking

VALS-AI · Jun 9, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 63.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Moonshot AI.

17.7%5.7% - 29.6%

#18 · Gemini 2.5 Pro

VALS-AI · Jun 9, 2026

Source label: google/gemini-2.5-pro

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 61.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

17.1%5% - 29.2%

#19 · Qwen3 Max

VALS-AI · Jun 9, 2026

Source label: alibaba/qwen3-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 59.1%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Alibaba.

15.7%3.2% - 28.1%

#20 · Grok 4.3

VALS-AI · Jun 9, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 56.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

15.3%3% - 27.7%

#21 · GPT-5.4 nano

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 54.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

15.3%2.6% - 27.9%

#22 · Claude Opus 4.1

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-1-20250805

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 52.3%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

12.5%2.9% - 22.1%

#23 · Claude Opus 4

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-1-20250805

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 52.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.

12.5%2.9% - 22.1%

#24 · Claude Opus 4.6

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-1-20250805

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 52.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-6-to-4-1.

12.5%2.9% - 22.1%

#25 · Grok 4 Fast

VALS-AI · Jun 9, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 45.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

11.5%2.9% - 20.1%

#26 · Grok 4.1 Fast

VALS-AI · Jun 9, 2026

Source label: grok/grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 43.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

7.7%1.4% - 13.9%

#27 · glm-4.7

VALS-AI · Jun 9, 2026

Source label: zai/glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 40.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

7.6%0.9% - 14.3%

#28 · GPT-5.4 mini

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5-mini-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 38.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

6.8%0.3% - 13.2%

#29 · minimax-m2.5

VALS-AI · Jun 9, 2026

Source label: minimax/MiniMax-M2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 36.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: MiniMax.

6.7%1.5% - 11.8%

#30 · Claude Sonnet 4

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-sonnet-4-20250514

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 34.1%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

6.5%2.4% - 10.6%

#31 · Claude Sonnet 4.6

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-sonnet-4-20250514

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 34.1%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Sonnet 4 via approved benchmark identity mapping map-claude-sonnet-4-6-to-4.

6.5%2.4% - 10.6%

#32 · Claude Haiku 4.5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 29.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

6.2%1.2% - 11.1%

#33 · minimax-m2.7

VALS-AI · Jun 9, 2026

Source label: minimax/MiniMax-M2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 27.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: MiniMax.

4.9%0.3% - 9.6%

#34 · o4 mini

VALS-AI · Jun 9, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 25%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

4.8%0.2% - 9.4%

#35 · glm-4.6

VALS-AI · Jun 9, 2026

Source label: zai/glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 22.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

4.3%0% - 9.2%

#36 · Grok Code Fast

VALS-AI · Jun 9, 2026

Source label: grok/grok-code-fast-1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 20.5%
Last updated: recent
Eligibility: specialized_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

4.3%2.1% - 6.6%

#37 · Mistral Large (Feb '24)

VALS-AI · Jun 9, 2026

Source label: mistralai/mistral-large-2512

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 18.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Mistral AI.

4%1.1% - 6.9%

#38 · glm-4.5

VALS-AI · Jun 9, 2026

Source label: zai/glm-4.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 15.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

2.9%1% - 4.9%

#39 · Gemini 2.5 Flash

VALS-AI · Jun 9, 2026

Source label: google/gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 13.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

2.6%0.6% - 4.6%

#40 · MiniMax-M2.1

VALS-AI · Jun 9, 2026

Source label: minimax/MiniMax-M2.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 11.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: MiniMax.

2.3%0% - 4.9%

#41 · deepseek-v3-0324

VALS-AI · Jun 9, 2026

Source label: fireworks/deepseek-v3-0324

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 9.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Fireworks AI.

1.7%0% - 3.4%

#42 · kimi-k2-instruct

VALS-AI · Jun 9, 2026

Source label: together/moonshotai/Kimi-K2-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 6.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Together AI.

1.3%0% - 2.5%

#43 · Magistral Medium 1.2

VALS-AI · Jun 9, 2026

Source label: mistralai/magistral-medium-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 4.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Mistral AI.

0.7%0% - 1.5%

#44 · devstral-2512

VALS-AI · Jun 9, 2026

Source label: mistralai/devstral-2512

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 4.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Mistral AI.

0.7%0% - 1.9%

#45 · Qwen3 235B A22B

VALS-AI · Jun 9, 2026

Source label: fireworks/qwen3-235b-a22b

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 0%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Fireworks AI.

0%0% - 0%

Benchmarks · /benchmarks/vals-ai-ioi

IOI

IOI result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 45

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_ioi_current

domain

Coding

What it measures vs what it misses

✓ Measures

International Olympiad in Informatics-style coding tasks.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Leaderboard · this benchmark version

#1 · Claude Fable 5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

72.3%53.9% - 90.6%

#2 · GPT-5.4

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 97.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

67.8%48.5% - 87.2%

#3 · GPT-5.2

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 95.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

54.8%38.5% - 71.1%

#4 · Claude Opus 4.7

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 93.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

47.1%30% - 64.1%

#5 · Qwen3.7 Max

VALS-AI · Jun 9, 2026

Source label: alibaba/qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 90.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Alibaba.

46.8%26.6% - 66.9%

#6 · GPT-5.3 Codex

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.3-codex

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 88.6%
Last updated: recent
Eligibility: specialized_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

43.8%28.5% - 59.1%

#7 · Gemini 3 Flash

VALS-AI · Jun 9, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 86.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

39.1%18.4% - 59.8%

#8 · Gemini 3 Pro Preview

VALS-AI · Jun 9, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 84.1%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

38.8%16.3% - 61.4%

#9 · deepseek-v4-pro

VALS-AI · Jun 9, 2026

Source label: deepseek/deepseek-v4-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 81.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: DeepSeek.

35.8%21.8% - 49.8%

#10 · Grok 4.20

VALS-AI · Jun 9, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 79.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

30.2%15.5% - 44.8%

#11 · Grok 4

VALS-AI · Jun 9, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 77.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

26.2%12.9% - 39.4%

#12 · Claude Opus 4.5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-5-20251101

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 75%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

23.6%8.8% - 38.4%

#13 · glm-5

VALS-AI · Jun 9, 2026

Source label: zai/glm-5-thinking

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 72.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

22%6.6% - 37.4%

#14 · GPT-5.1

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 70.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

21.5%7.1% - 35.9%

#15 · GPT-5

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 68.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

20%11% - 29%

#16 · Claude Sonnet 4.5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 65.9%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

18.3%6.7% - 29.9%

#17 · kimi-k2.5-thinking

VALS-AI · Jun 9, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 63.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Moonshot AI.

17.7%5.7% - 29.6%

#18 · Gemini 2.5 Pro

VALS-AI · Jun 9, 2026

Source label: google/gemini-2.5-pro

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 61.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

17.1%5% - 29.2%

#19 · Qwen3 Max

VALS-AI · Jun 9, 2026

Source label: alibaba/qwen3-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 59.1%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Alibaba.

15.7%3.2% - 28.1%

#20 · Grok 4.3

VALS-AI · Jun 9, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 56.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

15.3%3% - 27.7%

#21 · GPT-5.4 nano

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 54.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

15.3%2.6% - 27.9%

#22 · Claude Opus 4.1

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-1-20250805

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 52.3%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

12.5%2.9% - 22.1%

#23 · Claude Opus 4

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-1-20250805

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 52.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.

12.5%2.9% - 22.1%

#24 · Claude Opus 4.6

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-opus-4-1-20250805

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 52.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-6-to-4-1.

12.5%2.9% - 22.1%

#25 · Grok 4 Fast

VALS-AI · Jun 9, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 45.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

11.5%2.9% - 20.1%

#26 · Grok 4.1 Fast

VALS-AI · Jun 9, 2026

Source label: grok/grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 43.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

7.7%1.4% - 13.9%

#27 · glm-4.7

VALS-AI · Jun 9, 2026

Source label: zai/glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 40.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

7.6%0.9% - 14.3%

#28 · GPT-5.4 mini

VALS-AI · Jun 9, 2026

Source label: openai/gpt-5-mini-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 38.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

6.8%0.3% - 13.2%

#29 · minimax-m2.5

VALS-AI · Jun 9, 2026

Source label: minimax/MiniMax-M2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 36.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: MiniMax.

6.7%1.5% - 11.8%

#30 · Claude Sonnet 4

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-sonnet-4-20250514

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 34.1%
Last updated: recent
Eligibility: historical_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

6.5%2.4% - 10.6%

#31 · Claude Sonnet 4.6

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-sonnet-4-20250514

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 34.1%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Sonnet 4 via approved benchmark identity mapping map-claude-sonnet-4-6-to-4.

6.5%2.4% - 10.6%

#32 · Claude Haiku 4.5

VALS-AI · Jun 9, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 29.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic.

6.2%1.2% - 11.1%

#33 · minimax-m2.7

VALS-AI · Jun 9, 2026

Source label: minimax/MiniMax-M2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 27.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: MiniMax.

4.9%0.3% - 9.6%

#34 · o4 mini

VALS-AI · Jun 9, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 25%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

4.8%0.2% - 9.4%

#35 · glm-4.6

VALS-AI · Jun 9, 2026

Source label: zai/glm-4.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 22.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

4.3%0% - 9.2%

#36 · Grok Code Fast

VALS-AI · Jun 9, 2026

Source label: grok/grok-code-fast-1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 20.5%
Last updated: recent
Eligibility: specialized_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: xAI.

4.3%2.1% - 6.6%

#37 · Mistral Large (Feb '24)

VALS-AI · Jun 9, 2026

Source label: mistralai/mistral-large-2512

verified runtimevariant directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 18.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: dated variant (0.80)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Mistral AI.

4%1.1% - 6.9%

#38 · glm-4.5

VALS-AI · Jun 9, 2026

Source label: zai/glm-4.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 15.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Zhipu AI.

2.9%1% - 4.9%

#39 · Gemini 2.5 Flash

VALS-AI · Jun 9, 2026

Source label: google/gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 13.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

2.6%0.6% - 4.6%

#40 · MiniMax-M2.1

VALS-AI · Jun 9, 2026

Source label: minimax/MiniMax-M2.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 11.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: MiniMax.

2.3%0% - 4.9%

#41 · deepseek-v3-0324

VALS-AI · Jun 9, 2026

Source label: fireworks/deepseek-v3-0324

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 9.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Fireworks AI.

1.7%0% - 3.4%

#42 · kimi-k2-instruct

VALS-AI · Jun 9, 2026

Source label: together/moonshotai/Kimi-K2-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 6.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Together AI.

1.3%0% - 2.5%

#43 · Magistral Medium 1.2

VALS-AI · Jun 9, 2026

Source label: mistralai/magistral-medium-2509

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 4.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Mistral AI.

0.7%0% - 1.5%

#44 · devstral-2512

VALS-AI · Jun 9, 2026

Source label: mistralai/devstral-2512

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 4.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Mistral AI.

0.7%0% - 1.9%

#45 · Qwen3 235B A22B

VALS-AI · Jun 9, 2026

Source label: fireworks/qwen3-235b-a22b

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/ioi
Percentile: 0%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Fireworks AI.

0%0% - 0%

IOI

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

IOI

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version