Source · Vals AI Version · vals-ai snapshot 2026-06-24 Scores · 45
Test details
Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Vals AI
metric
Accuracy (%)
judge
Objective
direction
higher better
group id
vals_ioi_current
domain
Coding
What it measures vs what it misses
✓ Measures
International Olympiad in Informatics-style coding tasks.
✗ Misses
Adjacent skills outside the benchmark task mix, latency, and cost.
Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-6-to-4-1.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Anthropic. Backfilled from Claude Sonnet 4 via approved benchmark identity mapping map-claude-sonnet-4-6-to-4.