Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Artificial Analysis
metric
Score (%)
judge
Objective
direction
higher better
group id
aa_apex_agents_current
domain
Professional reasoning
What it measures vs what it misses
✓ Measures
Long-horizon agentic task completion.
✗ Misses
Adjacent capabilities, subjective preference, latency, and cost.
Why this countsLong-horizon agentic task completion.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesAdjacent capabilities, subjective preference, latency, and cost.