Source · Vals AI Version · vals-ai snapshot 2026-06-24 Scores · 59
Test details
Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Vals AI
metric
Accuracy (%)
judge
Objective
direction
higher better
group id
vals_mmmu_current
domain
Vision understanding
What it measures vs what it misses
✓ Measures
Multimodal reasoning over images and prompts.
✗ Misses
Adjacent skills outside the benchmark task mix, latency, and cost.
Why this countsIt is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not tell you whether the model can generate or edit images well.