GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #5 · Source label: GPT-5.4 (xhigh)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,401
- Percentile
- 91.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
91.3% percentile inside its fair comparison set1,401Raw benchmark value
APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #3 · Source label: GPT-5.4 (xhigh)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 33.3%
- Percentile
- 91.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
91.7% percentile inside its fair comparison set33.3%Raw benchmark value
SkillsBench
VALS-AI · Professional reasoning · Objective
Applied professional skills tasks.
Rank #5 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.7%
- Percentile
- 60%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: OpenAI.
60% percentile inside its fair comparison set51.7%Raw benchmark valueCI 42.5% - 61%
Harvey's Legal Agent Benchmark
VALS-AI · Professional reasoning · Objective
Completing legal work with documents, spreadsheets, presentations, and file-system tools.
Rank #14 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 0%
- Percentile
- 15.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: OpenAI.
15.4% percentile inside its fair comparison set0%Raw benchmark valueCI 0% - 0%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #6 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 86%
- Percentile
- 94.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.
94.4% percentile inside its fair comparison set86%Raw benchmark valueCI 85.2% - 86.9%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #27 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 74%
- Percentile
- 72.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.
72.5% percentile inside its fair comparison set74%Raw benchmark valueCI 72.3% - 75.7%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #24 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 41.3%
- Percentile
- 54.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.
54.9% percentile inside its fair comparison set41.3%Raw benchmark valueCI 37.1% - 45.5%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #25 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 77.5%
- Percentile
- 52%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.
52% percentile inside its fair comparison set77.5%Raw benchmark valueCI 71% - 84%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #5 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,524
- Percentile
- 98.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: expert. Source rank: #7. Votes: 3597. Organization: openai. License: Proprietary.
98.5% percentile inside its fair comparison set1,524Raw benchmark valueCI 1,513 - 1,534
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #10 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,483
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #13. Votes: 8167. Organization: openai. License: Proprietary.
97.2% percentile inside its fair comparison set1,483Raw benchmark valueCI 1,475 - 1,490
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #14 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #17. Votes: 8353. Organization: openai. License: Proprietary.
96% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,440 - 1,455
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #9 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,489
- Percentile
- 97.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_legal_and_government. Source rank: #10. Votes: 3184. Organization: openai. License: Proprietary.
97.3% percentile inside its fair comparison set1,489Raw benchmark valueCI 1,477 - 1,500
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #19 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,489
- Percentile
- 94.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_life_and_physical_and_social_science. Source rank: #22. Votes: 6649. Organization: openai. License: Proprietary.
94.4% percentile inside its fair comparison set1,489Raw benchmark valueCI 1,481 - 1,497
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #7 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,503
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_mathematical. Source rank: #8. Votes: 2211. Organization: openai. License: Proprietary.
98.1% percentile inside its fair comparison set1,503Raw benchmark valueCI 1,490 - 1,517
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #38 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,473
- Percentile
- 87.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_medicine_and_healthcare. Source rank: #48. Votes: 2988. Organization: openai. License: Proprietary.
87.5% percentile inside its fair comparison set1,473Raw benchmark valueCI 1,461 - 1,485
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #17 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,506
- Percentile
- 95.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_software_and_it_services. Source rank: #21. Votes: 15917. Organization: openai. License: Proprietary.
95.1% percentile inside its fair comparison set1,506Raw benchmark valueCI 1,500 - 1,512
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #8 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,469
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_writing_and_literature_and_language. Source rank: #10. Votes: 9946. Organization: openai. License: Proprietary.
97.8% percentile inside its fair comparison set1,469Raw benchmark valueCI 1,462 - 1,476
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #5 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,512
- Percentile
- 98.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: expert. Source rank: #6. Votes: 3597. Organization: openai. License: Proprietary.
98.5% percentile inside its fair comparison set1,512Raw benchmark valueCI 1,501 - 1,523
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #4 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,476
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #6. Votes: 8167. Organization: openai. License: Proprietary.
99.1% percentile inside its fair comparison set1,476Raw benchmark valueCI 1,468 - 1,483
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #13 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,443
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #16. Votes: 8353. Organization: openai. License: Proprietary.
96.3% percentile inside its fair comparison set1,443Raw benchmark valueCI 1,436 - 1,451
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #7 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,485
- Percentile
- 98%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_legal_and_government. Source rank: #8. Votes: 3184. Organization: openai. License: Proprietary.
98% percentile inside its fair comparison set1,485Raw benchmark valueCI 1,473 - 1,496
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #16 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 95.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_life_and_physical_and_social_science. Source rank: #20. Votes: 6649. Organization: openai. License: Proprietary.
95.4% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,470 - 1,486
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #7 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,498
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_mathematical. Source rank: #8. Votes: 2211. Organization: openai. License: Proprietary.
98.1% percentile inside its fair comparison set1,498Raw benchmark valueCI 1,484 - 1,511
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #34 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 88.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_medicine_and_healthcare. Source rank: #37. Votes: 2988. Organization: openai. License: Proprietary.
88.8% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,446 - 1,469
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #11 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,490
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_software_and_it_services. Source rank: #14. Votes: 15917. Organization: openai. License: Proprietary.
96.9% percentile inside its fair comparison set1,490Raw benchmark valueCI 1,484 - 1,495
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #12 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-high`. Category: industry_writing_and_literature_and_language. Source rank: #15. Votes: 9946. Organization: openai. License: Proprietary.
96.6% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,454 - 1,468
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #23 · Source label: openai/gpt-5.4-2026-03-05
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 43.3%
- Percentile
- 51.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.
51.1% percentile inside its fair comparison set43.3%Raw benchmark valueCI 37.2% - 49.4%
PRBench Legal
SL · Professional reasoning · Rubric
Applied legal reasoning on professional-domain tasks.
Rank #7
verified runtimeexact direct
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Scale Labs
- Raw value
- 44.4%
- Percentile
- 50%
- Last updated
- recent
- Eligibility
- headline eligible
50% percentile inside its fair comparison set44.4%Raw benchmark value
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #12 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 77%
- Percentile
- 89.8%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
89.8% percentile inside its fair comparison set77%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #13 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 75.1%
- Percentile
- 88.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
88.9% percentile inside its fair comparison set75.1%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #18 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 81.4%
- Percentile
- 84.3%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
84.3% percentile inside its fair comparison set81.4%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #8 · Source label: gpt-5.4-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 49.8%
- Percentile
- 93.5%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
93.5% percentile inside its fair comparison set49.8%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #20 · Source label: gpt-5.4-xhigh
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 100%
- Percentile
- 100%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
100% percentile inside its fair comparison set100%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #3 · Source label: openai/gpt-5-2025-08-07
backfilledproxy backfilledBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,103.2 score
- Percentile
- 93.8%
- Last updated
- archived
- Eligibility
- Fallback benchmark identity is visible for context but excluded from default ranking.
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-to-gpt-5.
93.8% percentile inside its fair comparison set1,103.2 scoreRaw benchmark valueCI 1,103.2 score - 1,103.2 score