| AA-Omniscience non-hallucination AA · % Text · Chat / text | 48.1%87.6%exact aliasverified runtime Row details- Raw value
- 48.1%
- Percentile
- 87.6%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- Claude Opus 4.7 (Non-reasoning, High Effort)
Source row | 8.8%22.1%exact aliasverified runtime Row details- Raw value
- 8.8%
- Percentile
- 22.1%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- GPT-5.5 (Non-reasoning)
Source row | |
| BrowseComp OFF · % Search · Search / tool use | 79.3%33.3%Officialmanual verifiedmanual verified Row details- Raw value
- 79.3%
- Percentile
- 33.3%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- claude-opus-4-7
Source row | 84.4%83.3%Officialmanual verifiedmanual verified Row details- Raw value
- 84.4%
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- gpt-5-5
Source row | |
| Time to first answer token AA · s Text · Chat / text | 13.81s48.6%exact aliasverified runtime Row details- Raw value
- 13.81s
- Percentile
- 48.6%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Source row | 107.59s1.9%exact aliasverified runtime Row details- Raw value
- 107.59s
- Percentile
- 1.9%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- GPT-5.5 (xhigh)
Source row | |
| Agentic Index AA · index Code · Coding | 4495.7%exact aliasverified runtime Row details- Raw value
- 44
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Source row | 2652.2%exact aliasverified runtime Row details- Raw value
- 26
- Percentile
- 52.2%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- GPT-5.5 (Non-reasoning)
Source row | |
| GDPval-AA AA · rating Text · Professional reasoning | 1,50793.5%exact aliasverified runtime Row details- Raw value
- 1,507
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
Source row | 1,12354.3%exact aliasverified runtime Row details- Raw value
- 1,123
- Percentile
- 54.3%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- GPT-5.5 (Non-reasoning)
Source row | |
| Terminal-Bench 2.0 OFF · % Code · Coding | 69.4%66.7%Officialmanual verifiedmanual verified Row details- Raw value
- 69.4%
- Percentile
- 66.7%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- claude-opus-4-7
Source row | 82.7%100%Officialmanual verifiedmanual verified Row details- Raw value
- 82.7%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- gpt-5-5
Source row | |
| Humanity's Last Exam OFF · % Text · Reasoning / math / science | 46.9%100%Officialmanual verifiedmanual verified Row details- Raw value
- 46.9%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- claude-opus-4-7
Source row | 41.4%71.4%Officialmanual verifiedmanual verified Row details- Raw value
- 41.4%
- Percentile
- 71.4%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- gpt-5-5
Source row | |
| MedScribe VALS-AI · % Text · Professional reasoning | 83%70%exact aliasverified runtime Row details- Raw value
- 83%
- Percentile
- 70%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- anthropic/claude-opus-4-7
Source row | 86.9%94%exact aliasverified runtime Row details- Raw value
- 86.9%
- Percentile
- 94%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- openai/gpt-5.5
Source row | |
| Document Arena AR · rating Document · Document understanding | 1,49795.8%exact aliasverified runtime Row details- Raw value
- 1,497
- Percentile
- 95.8%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- claude-opus-4-7-thinking
Source row | 1,47775%exact aliasverified runtime Row details- Raw value
- 1,477
- Percentile
- 75%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- gpt-5.5-high
Source row | |
| GPQA AA · % Text · Reasoning / math / science | 88.5%96.8%exact aliasverified runtime Row details- Raw value
- 88.5%
- Percentile
- 96.8%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- Claude Opus 4.7 (Non-reasoning, High Effort)
Source row | 76.8%76.7%exact aliasverified runtime Row details- Raw value
- 76.8%
- Percentile
- 76.7%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.94)
- Source label
- GPT-5.5 (Non-reasoning)
Source row | |
| ProgramBench VALS-AI · % Code · Coding | 0%70%exact aliasverified runtime Row details- Raw value
- 0%
- Percentile
- 70%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- anthropic/claude-opus-4-7
Source row | 0.5%90%exact aliasverified runtime Row details- Raw value
- 0.5%
- Percentile
- 90%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- openai/gpt-5.5
Source row | |
| HiL-Bench SL · % Code · Coding | 27.7%80%exact directverified runtime Row details- Raw value
- 27.7%
- Percentile
- 80%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- exact (1.00)
- Source label
- claude-opus-4-7
Source row | 29.1%100%exact aliasverified runtime Row details- Raw value
- 29.1%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
- Identity
- provider alias (0.92)
- Source label
- GPT-5.5
Source row | |