Visible tradeoffsThis is an efficiency signal, so it belongs beside quality rather than being mistaken for quality.
source
Artificial Analysis
metric
Tokens per second (tokens/s)
judge
Speed / cost
direction
higher better
group id
aa_output_speed_current
domain
Chat / text
What it measures vs what it misses
✓ Measures
Streaming output speed after generation begins.
✗ Misses
Answer quality and time spent reasoning before output starts.
Why this countsIt tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not prove deeper reasoning, tool use, or enterprise workflow reliability.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.