GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #22 · Source label: GPT-5.5 (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,123
- Percentile
- 54.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
54.3% percentile inside its fair comparison set1,123Raw benchmark value
APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #2 · Source label: GPT-5.5 (xhigh)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 37.7%
- Percentile
- 95.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
95.8% percentile inside its fair comparison set37.7%Raw benchmark value
Legal Research Bench
VALS-AI · Professional reasoning · Objective
Applied legal research tasks.
Rank #2 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 40.4%
- Percentile
- 91.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: OpenAI.
91.7% percentile inside its fair comparison set40.4%Raw benchmark valueCI 33.7% - 47.1%
SkillsBench
VALS-AI · Professional reasoning · Objective
Applied professional skills tasks.
Rank #1 · Source label: openai/gpt-5.5-codex
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 62.6%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: OpenAI.
100% percentile inside its fair comparison set62.6%Raw benchmark valueCI 53.9% - 71.2%
Public Benefits Bench
VALS-AI · Professional reasoning · Objective
Answering SNAP benefits questions across the public-benefits lifecycle.
Rank #6 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 60.9%
- Percentile
- 54.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench; provider: OpenAI.
54.5% percentile inside its fair comparison set60.9%Raw benchmark valueCI 58.4% - 63.4%
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #3 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 68
- Percentile
- 92.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: OpenAI.
92.3% percentile inside its fair comparison set68Raw benchmark valueCI 65 - 71
Harvey's Legal Agent Benchmark
VALS-AI · Professional reasoning · Objective
Completing legal work with documents, spreadsheets, presentations, and file-system tools.
Rank #7 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 3.8%
- Percentile
- 61.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: OpenAI.
61.5% percentile inside its fair comparison set3.8%Raw benchmark valueCI 1.4% - 6.1%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #5 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 86.5%
- Percentile
- 95.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.
95.6% percentile inside its fair comparison set86.5%Raw benchmark valueCI 85.7% - 87.3%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #4 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.8%
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: OpenAI.
88% percentile inside its fair comparison set51.8%Raw benchmark valueCI 50.7% - 52.8%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #13 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 75%
- Percentile
- 86.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.
86.8% percentile inside its fair comparison set75%Raw benchmark valueCI 73.3% - 76.7%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #15 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 49.1%
- Percentile
- 72.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.
72.5% percentile inside its fair comparison set49.1%Raw benchmark valueCI 44.8% - 53.4%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #4 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 86.9%
- Percentile
- 94%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.
94% percentile inside its fair comparison set86.9%Raw benchmark valueCI 83.1% - 90.7%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #6 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,523
- Percentile
- 98.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: expert. Source rank: #8. Votes: 2590. Organization: openai. License: Proprietary.
98.2% percentile inside its fair comparison set1,523Raw benchmark valueCI 1,510 - 1,535
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #7 · Source label: gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5`. Category: industry_business_and_management_and_financial_operations. Source rank: #9. Votes: 5645. Organization: openai. License: Proprietary.
98.1% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,478 - 1,496
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #16 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 95.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #19. Votes: 6255. Organization: openai. License: Proprietary.
95.4% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,438 - 1,455
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #12 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,485
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_legal_and_government. Source rank: #14. Votes: 2182. Organization: openai. License: Proprietary.
96.3% percentile inside its fair comparison set1,485Raw benchmark valueCI 1,472 - 1,499
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #12 · Source label: gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,496
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5`. Category: industry_life_and_physical_and_social_science. Source rank: #14. Votes: 4805. Organization: openai. License: Proprietary.
96.6% percentile inside its fair comparison set1,496Raw benchmark valueCI 1,487 - 1,505
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #9 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,499
- Percentile
- 97.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_mathematical. Source rank: #11. Votes: 1633. Organization: openai. License: Proprietary.
97.4% percentile inside its fair comparison set1,499Raw benchmark valueCI 1,484 - 1,514
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #31 · Source label: gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,477
- Percentile
- 89.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5`. Category: industry_medicine_and_healthcare. Source rank: #39. Votes: 2187. Organization: openai. License: Proprietary.
89.8% percentile inside its fair comparison set1,477Raw benchmark valueCI 1,463 - 1,490
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #14 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,509
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_software_and_it_services. Source rank: #17. Votes: 11038. Organization: openai. License: Proprietary.
96% percentile inside its fair comparison set1,509Raw benchmark valueCI 1,502 - 1,516
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #7 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,474
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_writing_and_literature_and_language. Source rank: #9. Votes: 7044. Organization: openai. License: Proprietary.
98.1% percentile inside its fair comparison set1,474Raw benchmark valueCI 1,466 - 1,482
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #4 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,512
- Percentile
- 98.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: expert. Source rank: #5. Votes: 2590. Organization: openai. License: Proprietary.
98.9% percentile inside its fair comparison set1,512Raw benchmark valueCI 1,500 - 1,525
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #5 · Source label: gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,475
- Percentile
- 98.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5`. Category: industry_business_and_management_and_financial_operations. Source rank: #7. Votes: 5645. Organization: openai. License: Proprietary.
98.7% percentile inside its fair comparison set1,475Raw benchmark valueCI 1,466 - 1,484
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #12 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #15. Votes: 6255. Organization: openai. License: Proprietary.
96.6% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,436 - 1,453
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #11 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,480
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_legal_and_government. Source rank: #13. Votes: 2182. Organization: openai. License: Proprietary.
96.6% percentile inside its fair comparison set1,480Raw benchmark valueCI 1,466 - 1,494
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #11 · Source label: gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5`. Category: industry_life_and_physical_and_social_science. Source rank: #13. Votes: 4805. Organization: openai. License: Proprietary.
96.9% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,473 - 1,491
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #12 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,483
- Percentile
- 96.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_mathematical. Source rank: #14. Votes: 1633. Organization: openai. License: Proprietary.
96.4% percentile inside its fair comparison set1,483Raw benchmark valueCI 1,468 - 1,498
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #42 · Source label: gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,453
- Percentile
- 86.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5`. Category: industry_medicine_and_healthcare. Source rank: #46. Votes: 2187. Organization: openai. License: Proprietary.
86.1% percentile inside its fair comparison set1,453Raw benchmark valueCI 1,439 - 1,466
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #10 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,490
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_software_and_it_services. Source rank: #13. Votes: 11038. Organization: openai. License: Proprietary.
97.2% percentile inside its fair comparison set1,490Raw benchmark valueCI 1,483 - 1,496
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #6 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,468
- Percentile
- 98.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gpt-5.5-high`. Category: industry_writing_and_literature_and_language. Source rank: #8. Votes: 7044. Organization: openai. License: Proprietary.
98.5% percentile inside its fair comparison set1,468Raw benchmark valueCI 1,460 - 1,476
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #8 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.5%
- Percentile
- 84.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.
84.4% percentile inside its fair comparison set51.5%Raw benchmark valueCI 43.8% - 59.3%
Public Benefits Bench v1
VALS-AI · Professional reasoning · Objective
Answering public-benefits questions across the benefits lifecycle.
Rank #8 · Source label: openai/gpt-5.5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 57.2%
- Percentile
- 41.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench-v1; provider: OpenAI.
41.7% percentile inside its fair comparison set57.2%Raw benchmark valueCI 54.7% - 59.8%
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #13 · Source label: gpt-5.5-medium
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 77%
- Percentile
- 88.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
88.9% percentile inside its fair comparison set77%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #42 · Source label: gpt-5.5-medium
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 68.7%
- Percentile
- 62%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
62% percentile inside its fair comparison set68.7%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #21 · Source label: gpt-5.5-medium
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 78.2%
- Percentile
- 81.5%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
81.5% percentile inside its fair comparison set78.2%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #3 · Source label: gpt-5.5-high
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 52.4%
- Percentile
- 98.1%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
98.1% percentile inside its fair comparison set52.4%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #22 · Source label: gpt-5.5-medium
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 100%
- Percentile
- 100%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
100% percentile inside its fair comparison set100%Raw benchmark value