UABUnbiased AI BenchAI model rankings with source links.
Every score links back to its source.
Home/Methodology
Methodology
Loading search
Live · updated continuously
Methodology

We do not mix unrelated scores.

Raw scores stay raw. Percentiles only happen when the test, metric, judge, and version line up. We check the source, infer only when the evidence is explicit, and refuse to guess unsupported matches.
Layer 1 · raw source records
Layer 2 · verified and labeled records
Layer 3 · track-aware recommendations and composites
1

Verify the source

Fetch the published leaderboard or dataset, save the snapshot, and keep the public source link. We verify the source URL, content type, and snapshot hash before we treat anything as a measurement.

Source URL preservedSnapshot hash loggedCapture time stored
2

Parse carefully

Parse the source into structured records. We infer only when aliases, mappings, and anomalies are explicit enough to support the match. If not, the item stays open instead of being silently guessed.

Parser version attachedAnomalies loggedManual review opened
3

Label the source

Attach benchmark metadata: judge type, metric family, direction, benchmark version, modality, fair comparison set, and source type. Copied rows, official-company results, human-checked rows, historical inserts, and demo fixtures all stay labeled.

Judge type keptSource type keptFair comparison set kept
4

Normalize locally

Normalize only inside the same test setup. The product does not flatten unrelated units into one global score.

Within-group onlyNo universal scalarCoverage gaps remain visible
5

Publish the readout

Expose secondary metrics that describe the shape of the visible evidence. Product recommendations can rank a reviewed model family rather than a single raw model ID, but raw scores stay attached to their exact source labels. Preview rows stay preview-labeled, official-company rows stay labeled, and neither is silently rewritten as independent consensus.

CoverageSpreadFreshnessOpen reviewsTrack rollups
6

State the limits

We verify public source links, snapshot hashes, parser outputs, and human-check metadata. We infer only when the mapping rule is explicit enough to audit later. We do not guess aliases, source type, modality, or pricing bands from vibes.

VerifiedInferredNever guessed