Verified but agingThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
LiveBench
metric
Score (%)
judge
Objective
direction
higher better
group id
livebench_coding_2026_01_08
domain
Coding
What it measures vs what it misses
✓ Measures
Objective coding accuracy on recent generation and completion tasks.
✗ Misses
Editing workflow ergonomics, latency, and subjective code style preference.
Why this countsIt tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not fully capture repo-scale iteration, IDE ergonomics, or long debugging loops.