Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Artificial Analysis
metric
Score (%)
judge
Objective
direction
higher better
group id
aa_mmmu_pro_current
domain
Vision understanding
What it measures vs what it misses
✓ Measures
Multimodal reasoning over images and prompts.
✗ Misses
Adjacent capabilities, subjective preference, latency, and cost.
Why this countsIt is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not tell you whether the model can generate or edit images well.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Artificial Analysis public leaderboard field `mmmuPro`. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.