Confidence simulator
Stress-test the ranking.
See how the leader changes when provider-official, stale, relay, or backfilled evidence is excluded.
| Scenario | Leader | Score | Confidence | Top model IDs |
|---|---|---|---|---|
| All public evidence | Claude Fable 5 | 100 | high_confidence | claude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1 |
| Exclude provider-official data | Claude Fable 5 | 100 | high_confidence | claude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1 |
| Exclude stale data | Claude Fable 5 | 100 | high_confidence | claude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1 |
| Exclude relay sources | Claude Fable 5 | 100 | high_confidence | claude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1 |
| Exclude backfilled rows | Claude Fable 5 | 100 | high_confidence | claude-fable-5, claude-opus-4-8, claude-opus-4-7, ernie-5-1 |