Visible tradeoffsThis is a rubric-judged signal, so it is more structured than arena taste but still depends on the scoring rubric.
source
Artificial Analysis
metric
Elo (rating)
judge
Rubric
direction
higher better
group id
aa_gdpval_current
domain
Professional reasoning
What it measures vs what it misses
✓ Measures
Agentic performance on economically valuable work tasks.
✗ Misses
Adjacent capabilities, subjective preference, latency, and cost.
Why this countsAgentic performance on economically valuable work tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesAdjacent capabilities, subjective preference, latency, and cost.
Leaderboard · this benchmark version
#1 · Claude Fable 5
AA · Jun 24, 2026
Source label: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)