UABUnbiased AI BenchAI model rankings with source links.
Every score links back to its source.
Home/Disagreement
Disagreement
Live · updated continuously
Home · source disagreements

Where AI rankings
disagree.

See where public benchmark sources disagree, which model each source favors, and why one leaderboard can mislead.
Public sources · 9
Open disputes · 7
Goal · show disagreement
Data version

Read this before trusting a headline.

Data version May 13, 2026Model list checked9 providers · 800 tracked modelsPage refreshed May 18, 2026

If this date looks stale, you may be seeing an older build or cached deploy.

Where AI rankings disagree

Rankings get less tidy but more honest
when disagreement stays visible.

This view shows where public sources refuse to tell the same story. A wide score range is not noise to hide. It is the main fact.

ArenaAR
LiveBenchLB
Artificial AnalysisAA
BridgeBenchBB
Terminal-BenchTERMINAL-BENCH
LLMBaseLLMBASE
Scale LabsSL
OpenCompassOC
MTEBMTEB

Where the rankings split

score range across tests

Source honesty scorecard

Not a moral rating. A quick check on how inspectable each source is when you need to dispute the headline number.

9 of 9 sources in the current registry
Benchmark and eval counts reflect what this app currently tracks for each source, not the source's full external catalog.
Arena
verified
11
793
noMay 13, 20260
LiveBench
verified
6
773
yesMay 13, 20260
Artificial Analysis
verified
7
638
yesMay 13, 20261
BridgeBench
verified
5
122
noMay 13, 20260
Scale Labs
verified
8
98
noMay 13, 20260
Terminal-Bench
verified
1
31
noMay 13, 20260
OpenCompass
verified
1
15
noMay 13, 20260
MTEB
verified
1
11
noMay 13, 20260
LLMBase
relay
0
0
noMay 13, 20260