Verify the source
Fetch the published leaderboard or dataset, save the snapshot, and keep the public source link. We verify the source URL, content type, and snapshot hash before we treat anything as a measurement.
Fetch the published leaderboard or dataset, save the snapshot, and keep the public source link. We verify the source URL, content type, and snapshot hash before we treat anything as a measurement.
Parse the source into structured records. We infer only when aliases, mappings, and anomalies are explicit enough to support the match. If not, the item stays open instead of being silently guessed.
Attach benchmark metadata: judge type, metric family, direction, benchmark version, modality, fair comparison set, and source type. Copied rows, official-company results, human-checked rows, historical inserts, and demo fixtures all stay labeled.
Normalize only inside the same test setup. The product does not flatten unrelated units into one global score.
Expose secondary metrics that describe the shape of the visible evidence. Product recommendations can rank a reviewed model family rather than a single raw model ID, but raw scores stay attached to their exact source labels. Preview rows stay preview-labeled, official-company rows stay labeled, and neither is silently rewritten as independent consensus.
We verify public source links, snapshot hashes, parser outputs, and human-check metadata. We infer only when the mapping rule is explicit enough to audit later. We do not guess aliases, source type, modality, or pricing bands from vibes.