GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #37 · Source label: GPT-5.4 nano (Non-Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 714
- Percentile
- 21.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
21.7% percentile inside its fair comparison set714Raw benchmark value
APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #10 · Source label: GPT-5.4 nano (xhigh)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 24.9%
- Percentile
- 62.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
62.5% percentile inside its fair comparison set24.9%Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #20 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 47
- Percentile
- 26.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: OpenAI.
26.9% percentile inside its fair comparison set47Raw benchmark valueCI 43 - 50
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #65 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 77.9%
- Percentile
- 28.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.
28.9% percentile inside its fair comparison set77.9%Raw benchmark valueCI 77.1% - 78.8%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #18 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 38.2%
- Percentile
- 32%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: OpenAI.
32% percentile inside its fair comparison set38.2%Raw benchmark valueCI 35.9% - 40.5%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #71 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 67.4%
- Percentile
- 23.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.
23.1% percentile inside its fair comparison set67.4%Raw benchmark valueCI 65.6% - 69.2%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #25 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 41%
- Percentile
- 52.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.
52.9% percentile inside its fair comparison set41%Raw benchmark valueCI 36.6% - 45.5%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #26 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 77.1%
- Percentile
- 50%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.
50% percentile inside its fair comparison set77.1%Raw benchmark valueCI 73.4% - 80.8%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #76 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,437
- Percentile
- 72.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: expert. Source rank: #94. Votes: 3595. Organization: openai. License: Proprietary.
72.7% percentile inside its fair comparison set1,437Raw benchmark valueCI 1,427 - 1,448
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #97 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #117. Votes: 7669. Organization: openai. License: Proprietary.
69.8% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,394 - 1,410
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #109 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,351
- Percentile
- 66.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #133. Votes: 7919. Organization: openai. License: Proprietary.
66.6% percentile inside its fair comparison set1,351Raw benchmark valueCI 1,343 - 1,359
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #114 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 62.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_legal_and_government. Source rank: #137. Votes: 2965. Organization: openai. License: Proprietary.
62.1% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,384 - 1,408
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #94 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 71.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_life_and_physical_and_social_science. Source rank: #116. Votes: 6425. Organization: openai. License: Proprietary.
71.2% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,410 - 1,427
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #59 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,439
- Percentile
- 81.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_mathematical. Source rank: #72. Votes: 2092. Organization: openai. License: Proprietary.
81.2% percentile inside its fair comparison set1,439Raw benchmark valueCI 1,425 - 1,453
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #100 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 66.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_medicine_and_healthcare. Source rank: #122. Votes: 2883. Organization: openai. License: Proprietary.
66.4% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,408 - 1,432
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #80 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 75.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_software_and_it_services. Source rank: #99. Votes: 15267. Organization: openai. License: Proprietary.
75.7% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,444 - 1,456
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #115 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,362
- Percentile
- 64.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_writing_and_literature_and_language. Source rank: #140. Votes: 9207. Organization: openai. License: Proprietary.
64.8% percentile inside its fair comparison set1,362Raw benchmark valueCI 1,354 - 1,369
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #96 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 65.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: expert. Source rank: #118. Votes: 3595. Organization: openai. License: Proprietary.
65.5% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,385 - 1,406
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #112 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,362
- Percentile
- 65.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #134. Votes: 7669. Organization: openai. License: Proprietary.
65.1% percentile inside its fair comparison set1,362Raw benchmark valueCI 1,354 - 1,369
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #125 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,324
- Percentile
- 61.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #151. Votes: 7919. Organization: openai. License: Proprietary.
61.6% percentile inside its fair comparison set1,324Raw benchmark valueCI 1,316 - 1,331
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #123 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,364
- Percentile
- 59.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_legal_and_government. Source rank: #149. Votes: 2965. Organization: openai. License: Proprietary.
59.1% percentile inside its fair comparison set1,364Raw benchmark valueCI 1,352 - 1,376
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #121 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,377
- Percentile
- 62.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_life_and_physical_and_social_science. Source rank: #143. Votes: 6425. Organization: openai. License: Proprietary.
62.8% percentile inside its fair comparison set1,377Raw benchmark valueCI 1,369 - 1,385
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #78 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 75%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_mathematical. Source rank: #94. Votes: 2092. Organization: openai. License: Proprietary.
75% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,402 - 1,430
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #118 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,376
- Percentile
- 60.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_medicine_and_healthcare. Source rank: #142. Votes: 2883. Organization: openai. License: Proprietary.
60.3% percentile inside its fair comparison set1,376Raw benchmark valueCI 1,364 - 1,388
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #107 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,401
- Percentile
- 67.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_software_and_it_services. Source rank: #129. Votes: 15267. Organization: openai. License: Proprietary.
67.4% percentile inside its fair comparison set1,401Raw benchmark valueCI 1,395 - 1,407
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #125 · Source label: gpt-5.4-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,337
- Percentile
- 61.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_writing_and_literature_and_language. Source rank: #152. Votes: 9207. Organization: openai. License: Proprietary.
61.7% percentile inside its fair comparison set1,337Raw benchmark valueCI 1,330 - 1,344
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #32 · Source label: openai/gpt-5.4-nano-2026-03-17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 38.1%
- Percentile
- 31.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.
31.1% percentile inside its fair comparison set38.1%Raw benchmark valueCI 32% - 44.1%
PRBench Legal
SL · Professional reasoning · Rubric
Applied legal reasoning on professional-domain tasks.
Rank #5 · Source label: gpt-5
backfilledproxy backfilledBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Scale Labs
- Raw value
- 49%
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- Fallback benchmark identity is visible for context but excluded from default ranking.
Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.
83.3% percentile inside its fair comparison set49%Raw benchmark value
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #104 · Source label: gpt-5.4-nano
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 39.2%
- Percentile
- 4.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
4.6% percentile inside its fair comparison set39.2%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #109 · Source label: gpt-5.4-nano
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 32.4%
- Percentile
- 0%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
0% percentile inside its fair comparison set32.4%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #102 · Source label: gpt-5-nano
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 1.5%
- Percentile
- 6.5%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
6.5% percentile inside its fair comparison set1.5%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #108 · Source label: gpt-5.4-nano
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 17.9%
- Percentile
- 0.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
0.9% percentile inside its fair comparison set17.9%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #103 · Source label: gpt-5-nano-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 90.2%
- Percentile
- 5.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
5.6% percentile inside its fair comparison set90.2%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #5 · Source label: openai/gpt-5-2025-08-07
backfilledproxy backfilledBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,103.2 score
- Percentile
- 93.8%
- Last updated
- archived
- Eligibility
- Fallback benchmark identity is visible for context but excluded from default ranking.
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.
93.8% percentile inside its fair comparison set1,103.2 scoreRaw benchmark valueCI 1,103.2 score - 1,103.2 score