UABUnbiased AI BenchAI model rankings with source links.
Every score links back to its source.
Home/Changelog
Changelog
Loading search
Live · updated continuously
Operational history

Changelog

Parser changes, mapping fixes, methodology changes, and product releases stay visible because data plumbing changes what the site appears to know.
Entries · 7
Categories · parser / mapping / product / methodology

What changed this week

alert
7 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

models
Artificial Analysis moved via real benchmark movement

0 benchmark rows were added, 0 removed, and 134 existing rows changed value or evaluation date. Window: 2026-05-13T01:05:56Z -> 2026-05-13T01:19:35Z.

product
Initial glass-box matrix release

Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.

models
Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

models
Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

models
BridgeBench parser fallback added

Added alternate selectors for category headers after leaderboard markup drift.

disagreement
Gemini 2.5 Pro is still a split decision

Cross-benchmark spread sits at 100.0 points, which means rankings still depend heavily on which visible benchmark slices you weight most.

disagreement
Gemini 3.1 Pro is still a split decision

Cross-benchmark spread sits at 100.0 points, which means rankings still depend heavily on which visible benchmark slices you weight most.

2026-04-16
product
Initial glass-box matrix release Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.
2026-04-16
methodology
Methodology contract published Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.
2026-04-15
mapping
Artificial Analysis ID rule adopted Stable model and creator IDs are now the preferred external identity keys when available.
2026-04-15
parser
BridgeBench parser fallback added Added alternate selectors for category headers after leaderboard markup drift.
2026-04-16
parser
LiveBench worker now feeds app bundle LiveBench records are now generated from the official public leaderboard dataset and merged into the catalog as a checked-in fragment with snapshot and parser metadata.
2026-04-16
mapping
Provider model registry added Current-model coverage now merges a generated registry sourced from official provider docs, with per-model verification receipts and a review queue for newly discovered names.
2026-04-16
methodology
Current models separated from historical benchmark identities New provider-verified variants such as GPT-5.4 and Claude Sonnet 4.5 now remain distinct from older benchmarked IDs so legacy scores are not silently relabeled as newer models.

What changed this week

alert
7 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

models
Artificial Analysis moved via real benchmark movement

0 benchmark rows were added, 0 removed, and 134 existing rows changed value or evaluation date. Window: 2026-05-13T01:05:56Z -> 2026-05-13T01:19:35Z.

Evidence window: 2026-05-13T01:05:56Z -> 2026-05-13T01:19:35Z

product
Initial glass-box matrix release

Added matrix homepage, comparable-group normalization, per-cell receipts, source pages, and custom composite preview.

Evidence window: 2026-04-16

models
Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

Evidence window: 2026-04-16

models
Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

Evidence window: 2026-04-15

models
BridgeBench parser fallback added

Added alternate selectors for category headers after leaderboard markup drift.

Evidence window: 2026-04-15

disagreement
Gemini 2.5 Pro is still a split decision

Cross-benchmark spread sits at 100.0 points, which means rankings still depend heavily on which visible benchmark slices you weight most.

disagreement
Gemini 3.1 Pro is still a split decision

Cross-benchmark spread sits at 100.0 points, which means rankings still depend heavily on which visible benchmark slices you weight most.