LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #7 · Source label: openai/gpt-5-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 86%
- Percentile
- 93.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.
93.3% percentile inside its fair comparison set86%Raw benchmark valueCI 85.3% - 86.8%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #31 · Source label: openai/gpt-5-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 73.4%
- Percentile
- 67%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.
67% percentile inside its fair comparison set73.4%Raw benchmark valueCI 71.7% - 75.1%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #12 · Source label: openai/gpt-5-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 49.6%
- Percentile
- 78.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.
78.4% percentile inside its fair comparison set49.6%Raw benchmark valueCI 45.5% - 53.7%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #13 · Source label: openai/gpt-5-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 83.7%
- Percentile
- 76%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.
76% percentile inside its fair comparison set83.7%Raw benchmark valueCI 79.9% - 87.4%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #43 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 84.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: expert. Source rank: #53. Votes: 2762. Organization: openai. License: Proprietary.
84.7% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,458 - 1,482
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #24 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 92.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_business_and_management_and_financial_operations. Source rank: #31. Votes: 6538. Organization: openai. License: Proprietary.
92.8% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,451 - 1,468
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #40 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 87.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_entertainment_and_sports_and_media. Source rank: #52. Votes: 6476. Organization: openai. License: Proprietary.
87.9% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,406 - 1,423
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #30 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,464
- Percentile
- 90.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_legal_and_government. Source rank: #40. Votes: 2559. Organization: openai. License: Proprietary.
90.3% percentile inside its fair comparison set1,464Raw benchmark valueCI 1,451 - 1,477
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #44 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 86.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_life_and_physical_and_social_science. Source rank: #56. Votes: 5212. Organization: openai. License: Proprietary.
86.7% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,450 - 1,468
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #53 · Source label: gpt-5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 83.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: industry_mathematical. Source rank: #65. Votes: 1747. Organization: openai. License: Proprietary.
83.1% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,428 - 1,456
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #26 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,481
- Percentile
- 91.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_medicine_and_healthcare. Source rank: #33. Votes: 2468. Organization: openai. License: Proprietary.
91.5% percentile inside its fair comparison set1,481Raw benchmark valueCI 1,468 - 1,493
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #35 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,488
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_software_and_it_services. Source rank: #44. Votes: 12869. Organization: openai. License: Proprietary.
89.5% percentile inside its fair comparison set1,488Raw benchmark valueCI 1,482 - 1,495
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #40 · Source label: gpt-5.3-chat-latest
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.3-chat-latest`. Category: industry_writing_and_literature_and_language. Source rank: #52. Votes: 7651. Organization: openai. License: Proprietary.
88% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,419 - 1,434
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #75 · Source label: gpt-5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 73.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: expert. Source rank: #91. Votes: 1588. Organization: openai. License: Proprietary.
73.1% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,405 - 1,436
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #83 · Source label: gpt-5-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,399
- Percentile
- 74.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: industry_business_and_management_and_financial_operations. Source rank: #99. Votes: 5612. Organization: openai. License: Proprietary.
74.2% percentile inside its fair comparison set1,399Raw benchmark valueCI 1,391 - 1,407
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #77 · Source label: gpt-5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,378
- Percentile
- 76.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #93. Votes: 5839. Organization: openai. License: Proprietary.
76.5% percentile inside its fair comparison set1,378Raw benchmark valueCI 1,370 - 1,386
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #62 · Source label: gpt-5-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 79.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: industry_legal_and_government. Source rank: #75. Votes: 1951. Organization: openai. License: Proprietary.
79.5% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,416 - 1,443
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #92 · Source label: gpt-5-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 71.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: industry_life_and_physical_and_social_science. Source rank: #108. Votes: 5030. Organization: openai. License: Proprietary.
71.8% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,408 - 1,425
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #97 · Source label: gpt-5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,404
- Percentile
- 68.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: industry_mathematical. Source rank: #115. Votes: 1747. Organization: openai. License: Proprietary.
68.8% percentile inside its fair comparison set1,404Raw benchmark valueCI 1,390 - 1,418
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #89 · Source label: gpt-5-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,413
- Percentile
- 70.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: industry_medicine_and_healthcare. Source rank: #107. Votes: 1695. Organization: openai. License: Proprietary.
70.2% percentile inside its fair comparison set1,413Raw benchmark valueCI 1,399 - 1,428
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #91 · Source label: gpt-5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 72.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: industry_software_and_it_services. Source rank: #109. Votes: 10924. Organization: openai. License: Proprietary.
72.3% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,413 - 1,426
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #82 · Source label: gpt-5-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,385
- Percentile
- 75%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: industry_writing_and_literature_and_language. Source rank: #100. Votes: 6777. Organization: openai. License: Proprietary.
75% percentile inside its fair comparison set1,385Raw benchmark valueCI 1,378 - 1,393
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #22 · Source label: openai/gpt-5-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 43.7%
- Percentile
- 53.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.
53.3% percentile inside its fair comparison set43.7%Raw benchmark valueCI 37.1% - 50.2%
PRBench Legal
SL · Professional reasoning · Rubric
Applied legal reasoning on professional-domain tasks.
Rank #3
verified runtimeexact direct
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Scale Labs
- Raw value
- 49%
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- headline eligible
83.3% percentile inside its fair comparison set49%Raw benchmark value
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #50 · Source label: gpt-5-pro-2025-10-06
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 57%
- Percentile
- 54.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
54.6% percentile inside its fair comparison set57%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #33 · Source label: gpt-5-pro-2025-10-06
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 70.5%
- Percentile
- 70.4%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
70.4% percentile inside its fair comparison set70.5%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #54 · Source label: gpt-5-pro-2025-10-06
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 26.3%
- Percentile
- 50.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
50.9% percentile inside its fair comparison set26.3%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #43 · Source label: gpt-5-pro-2025-10-06
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 44.8%
- Percentile
- 61.1%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
61.1% percentile inside its fair comparison set44.8%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #7 · Source label: gpt-5-pro-2025-10-06
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 100%
- Percentile
- 100%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
100% percentile inside its fair comparison set100%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #2 · Source label: openai/gpt-5-2025-08-07
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,103.2 score
- Percentile
- 93.8%
- Last updated
- archived
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
93.8% percentile inside its fair comparison set1,103.2 scoreRaw benchmark valueCI 1,103.2 score - 1,103.2 score