Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
LiveBench
metric
Score (%)
judge
Objective
direction
higher better
group id
livebench_overall_2026_01_08
domain
Professional reasoning
What it measures vs what it misses
✓ Measures
Average objective performance across LiveBench's current public category mix.
✗ Misses
Latency, cost, subjective preference, and multimodal behavior outside LiveBench.
Why this countsAverage objective performance across LiveBench's current public category mix.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesLatency, cost, subjective preference, and multimodal behavior outside LiveBench.