GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #35 · Source label: GPT-5.4 mini (Non-Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 755
- Percentile
- 26.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
26.1% percentile inside its fair comparison set755Raw benchmark value
APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #7 · Source label: GPT-5.4 mini (xhigh)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 28.2%
- Percentile
- 75%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
75% percentile inside its fair comparison set28.2%Raw benchmark value
Legal Research Bench
VALS-AI · Professional reasoning · Objective
Applied legal research tasks.
Rank #13 · Source label: openai/gpt-5.4-mini-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 12.5%
- Percentile
- 0%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: OpenAI.
0% percentile inside its fair comparison set12.5%Raw benchmark valueCI 8% - 17%
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #14 · Source label: openai/gpt-5.4-mini-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 52
- Percentile
- 50%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: OpenAI.
50% percentile inside its fair comparison set52Raw benchmark valueCI 48 - 56
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #47 · Source label: openai/gpt-5-mini-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 81.8%
- Percentile
- 48.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.
48.9% percentile inside its fair comparison set81.8%Raw benchmark valueCI 80.9% - 82.7%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #9 · Source label: openai/gpt-5.4-mini-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 45.4%
- Percentile
- 68%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: OpenAI.
68% percentile inside its fair comparison set45.4%Raw benchmark valueCI 44.5% - 46.2%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #11 · Source label: openai/gpt-5-mini-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 75.2%
- Percentile
- 89%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.
89% percentile inside its fair comparison set75.2%Raw benchmark valueCI 73.6% - 76.9%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #22 · Source label: openai/gpt-5-mini-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 43%
- Percentile
- 58.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.
58.8% percentile inside its fair comparison set43%Raw benchmark valueCI 39% - 47.1%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #19 · Source label: openai/gpt-5-mini-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 80.6%
- Percentile
- 64%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.
64% percentile inside its fair comparison set80.6%Raw benchmark valueCI 76.8% - 84.3%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #32 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 88.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: expert. Source rank: #39. Votes: 3590. Organization: openai. License: Proprietary.
88.7% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,471 - 1,492
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #22 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 93.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #28. Votes: 7999. Organization: openai. License: Proprietary.
93.4% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,452 - 1,468
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #47 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,408
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #59. Votes: 8212. Organization: openai. License: Proprietary.
85.8% percentile inside its fair comparison set1,408Raw benchmark valueCI 1,400 - 1,415
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #41 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 86.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_legal_and_government. Source rank: #54. Votes: 3015. Organization: openai. License: Proprietary.
86.6% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,445 - 1,469
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #41 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 87.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_life_and_physical_and_social_science. Source rank: #52. Votes: 6537. Organization: openai. License: Proprietary.
87.6% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,457 - 1,474
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #37 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,456
- Percentile
- 88.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_mathematical. Source rank: #45. Votes: 2140. Organization: openai. License: Proprietary.
88.3% percentile inside its fair comparison set1,456Raw benchmark valueCI 1,443 - 1,470
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #55 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 81.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_medicine_and_healthcare. Source rank: #71. Votes: 2894. Organization: openai. License: Proprietary.
81.7% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,443 - 1,467
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #38 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 88.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_software_and_it_services. Source rank: #49. Votes: 15521. Organization: openai. License: Proprietary.
88.6% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,481 - 1,493
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #44 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,423
- Percentile
- 86.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_writing_and_literature_and_language. Source rank: #57. Votes: 9541. Organization: openai. License: Proprietary.
86.7% percentile inside its fair comparison set1,423Raw benchmark valueCI 1,416 - 1,430
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #57 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 79.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: expert. Source rank: #68. Votes: 3590. Organization: openai. License: Proprietary.
79.6% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,425 - 1,446
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #59 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #71. Votes: 7999. Organization: openai. License: Proprietary.
81.8% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,410 - 1,425
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #82 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,372
- Percentile
- 74.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #100. Votes: 8212. Organization: openai. License: Proprietary.
74.9% percentile inside its fair comparison set1,372Raw benchmark valueCI 1,364 - 1,379
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #83 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 72.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_legal_and_government. Source rank: #101. Votes: 3015. Organization: openai. License: Proprietary.
72.5% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,403 - 1,427
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #91 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 72.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_life_and_physical_and_social_science. Source rank: #107. Votes: 6537. Organization: openai. License: Proprietary.
72.1% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,409 - 1,425
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #58 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,433
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_mathematical. Source rank: #68. Votes: 2140. Organization: openai. License: Proprietary.
81.5% percentile inside its fair comparison set1,433Raw benchmark valueCI 1,419 - 1,446
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #101 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,397
- Percentile
- 66.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_medicine_and_healthcare. Source rank: #122. Votes: 2894. Organization: openai. License: Proprietary.
66.1% percentile inside its fair comparison set1,397Raw benchmark valueCI 1,385 - 1,409
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #71 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 78.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_software_and_it_services. Source rank: #86. Votes: 15521. Organization: openai. License: Proprietary.
78.5% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,430 - 1,442
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #71 · Source label: gpt-5.4-mini-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,392
- Percentile
- 78.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-mini-high`. Category: industry_writing_and_literature_and_language. Source rank: #86. Votes: 9541. Organization: openai. License: Proprietary.
78.4% percentile inside its fair comparison set1,392Raw benchmark valueCI 1,385 - 1,399
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #9 · Source label: openai/gpt-5.4-mini-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 50.8%
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.
82.2% percentile inside its fair comparison set50.8%Raw benchmark valueCI 44.1% - 57.5%
PRBench Legal
SL · Professional reasoning · Rubric
Applied legal reasoning on professional-domain tasks.
Rank #4 · Source label: gpt-5
backfilledproxy backfilledBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Scale Labs
- Raw value
- 49%
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- Fallback benchmark identity is visible for context but excluded from default ranking.
Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-mini-to-gpt-5.
83.3% percentile inside its fair comparison set49%Raw benchmark value
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #101 · Source label: gpt-5-mini-minimal
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 41.5%
- Percentile
- 7.4%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
7.4% percentile inside its fair comparison set41.5%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #103 · Source label: gpt-5.4-mini
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 37%
- Percentile
- 5.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
5.6% percentile inside its fair comparison set37%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #103 · Source label: gpt-5-mini-minimal
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 0.5%
- Percentile
- 5.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
5.6% percentile inside its fair comparison set0.5%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #99 · Source label: gpt-5-mini-minimal
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 29.9%
- Percentile
- 9.3%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
9.3% percentile inside its fair comparison set29.9%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #93 · Source label: gpt-5-mini-minimal
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 94.1%
- Percentile
- 14.8%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
14.8% percentile inside its fair comparison set94.1%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #4 · Source label: openai/gpt-5-2025-08-07
backfilledproxy backfilledBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,103.2 score
- Percentile
- 93.8%
- Last updated
- archived
- Eligibility
- Fallback benchmark identity is visible for context but excluded from default ranking.
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-mini-to-gpt-5.
93.8% percentile inside its fair comparison set1,103.2 scoreRaw benchmark valueCI 1,103.2 score - 1,103.2 score