Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Artificial Analysis
metric
Score (%)
judge
Objective
direction
higher better
group id
aa_critpt_current
domain
Reasoning / math / science
What it measures vs what it misses
✓ Measures
Physics-oriented reasoning.
✗ Misses
Adjacent capabilities, subjective preference, latency, and cost.
Why this countsIt is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt still misses product usability, latency, and whether the model stays correct in messy real workflows.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Artificial Analysis public leaderboard field `critpt`. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.