UAB
Home/Trust/Simulator
Simulator
Loading search
Live · updated continuously
Confidence simulator

Stress-test the ranking.

See how the leader changes when provider-official, stale, relay, or backfilled evidence is excluded.
Scenarios · 5
API · /api/confidence-simulator?preset=everyday-chatbot
ScenarioLeaderScoreConfidenceTop model IDs
All public evidenceClaude Fable 5100high_confidenceclaude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1
Exclude provider-official dataClaude Fable 5100high_confidenceclaude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1
Exclude stale dataClaude Fable 5100high_confidenceclaude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1
Exclude relay sourcesClaude Fable 5100high_confidenceclaude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1
Exclude backfilled rowsClaude Fable 5100high_confidenceclaude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1