GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #2 · Source label: Claude Opus 4.8 (Adaptive Reasoning, Max Effort)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,605
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
97.8% percentile inside its fair comparison set1,605Raw benchmark value
Legal Research Bench
VALS-AI · Professional reasoning · Objective
Applied legal research tasks.
Rank #1 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 43.8%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: Anthropic.
100% percentile inside its fair comparison set43.8%Raw benchmark valueCI 37% - 50.5%
SkillsBench
VALS-AI · Professional reasoning · Objective
Applied professional skills tasks.
Rank #2 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 59.2%
- Percentile
- 90%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: Anthropic.
90% percentile inside its fair comparison set59.2%Raw benchmark valueCI 50.4% - 68%
Public Benefits Bench
VALS-AI · Professional reasoning · Objective
Answering SNAP benefits questions across the public-benefits lifecycle.
Rank #1 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 68.1%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench; provider: Anthropic.
100% percentile inside its fair comparison set68.1%Raw benchmark valueCI 65.8% - 70.5%
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #2 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 70
- Percentile
- 96.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Anthropic.
96.2% percentile inside its fair comparison set70Raw benchmark valueCI 69 - 72
Harvey's Legal Agent Benchmark
VALS-AI · Professional reasoning · Objective
Completing legal work with documents, spreadsheets, presentations, and file-system tools.
Rank #2 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 9.6%
- Percentile
- 92.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: Anthropic.
92.3% percentile inside its fair comparison set9.6%Raw benchmark valueCI 5.4% - 13.8%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #30 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 83.6%
- Percentile
- 67.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Anthropic.
67.8% percentile inside its fair comparison set83.6%Raw benchmark valueCI 82.6% - 84.5%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #3 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 53.9%
- Percentile
- 92%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Anthropic.
92% percentile inside its fair comparison set53.9%Raw benchmark valueCI 53.6% - 54.2%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #8 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 75.6%
- Percentile
- 92.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Anthropic.
92.3% percentile inside its fair comparison set75.6%Raw benchmark valueCI 74% - 77.3%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #6 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 53.2%
- Percentile
- 90.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.
90.2% percentile inside its fair comparison set53.2%Raw benchmark valueCI 49% - 57.5%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #7 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 85.8%
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Anthropic.
88% percentile inside its fair comparison set85.8%Raw benchmark valueCI 82% - 89.5%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #4
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,530
- Percentile
- 98.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: expert. Source rank: #5. Votes: 1258. Organization: anthropic. License: Proprietary.
98.9% percentile inside its fair comparison set1,530Raw benchmark valueCI 1,513 - 1,547
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,500
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_business_and_management_and_financial_operations. Source rank: #4. Votes: 2601. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,500Raw benchmark valueCI 1,488 - 1,512
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_entertainment_and_sports_and_media. Source rank: #10. Votes: 2946. Organization: anthropic. License: Proprietary.
97.8% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,446 - 1,469
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,510
- Percentile
- 99.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_legal_and_government. Source rank: #3. Votes: 1020. Organization: anthropic. License: Proprietary.
99.3% percentile inside its fair comparison set1,510Raw benchmark valueCI 1,490 - 1,530
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #10
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,497
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_life_and_physical_and_social_science. Source rank: #12. Votes: 2213. Organization: anthropic. License: Proprietary.
97.2% percentile inside its fair comparison set1,497Raw benchmark valueCI 1,484 - 1,510
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,502
- Percentile
- 97.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_mathematical. Source rank: #9. Votes: 668. Organization: anthropic. License: Proprietary.
97.7% percentile inside its fair comparison set1,502Raw benchmark valueCI 1,479 - 1,525
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #11
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,499
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_medicine_and_healthcare. Source rank: #13. Votes: 982. Organization: anthropic. License: Proprietary.
96.6% percentile inside its fair comparison set1,499Raw benchmark valueCI 1,479 - 1,518
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,524
- Percentile
- 98.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_software_and_it_services. Source rank: #7. Votes: 5156. Organization: anthropic. License: Proprietary.
98.8% percentile inside its fair comparison set1,524Raw benchmark valueCI 1,515 - 1,533
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #10
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_writing_and_literature_and_language. Source rank: #13. Votes: 3391. Organization: anthropic. License: Proprietary.
97.2% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,456 - 1,477
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #9
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,503
- Percentile
- 97.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: expert. Source rank: #11. Votes: 1258. Organization: anthropic. License: Proprietary.
97.1% percentile inside its fair comparison set1,503Raw benchmark valueCI 1,486 - 1,520
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #9
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,463
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_business_and_management_and_financial_operations. Source rank: #12. Votes: 2601. Organization: anthropic. License: Proprietary.
97.5% percentile inside its fair comparison set1,463Raw benchmark valueCI 1,450 - 1,475
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #18
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,434
- Percentile
- 94.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_entertainment_and_sports_and_media. Source rank: #22. Votes: 2946. Organization: anthropic. License: Proprietary.
94.7% percentile inside its fair comparison set1,434Raw benchmark valueCI 1,422 - 1,445
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #9
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 97.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_legal_and_government. Source rank: #10. Votes: 1020. Organization: anthropic. License: Proprietary.
97.3% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,463 - 1,502
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #23
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 93.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_life_and_physical_and_social_science. Source rank: #28. Votes: 2213. Organization: anthropic. License: Proprietary.
93.2% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,452 - 1,478
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #9
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,492
- Percentile
- 97.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_mathematical. Source rank: #11. Votes: 668. Organization: anthropic. License: Proprietary.
97.4% percentile inside its fair comparison set1,492Raw benchmark valueCI 1,469 - 1,515
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #37
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 87.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_medicine_and_healthcare. Source rank: #40. Votes: 982. Organization: anthropic. License: Proprietary.
87.8% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,436 - 1,474
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #14
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_software_and_it_services. Source rank: #17. Votes: 5156. Organization: anthropic. License: Proprietary.
96% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,478 - 1,496
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 95.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-8`. Category: industry_writing_and_literature_and_language. Source rank: #21. Votes: 3391. Organization: anthropic. License: Proprietary.
95.1% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,439 - 1,460
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #3 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 54.8%
- Percentile
- 95.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Anthropic.
95.6% percentile inside its fair comparison set54.8%Raw benchmark valueCI 48.2% - 61.3%
Public Benefits Bench v1
VALS-AI · Professional reasoning · Objective
Answering public-benefits questions across the benefits lifecycle.
Rank #2 · Source label: anthropic/claude-opus-4-8
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 62.1%
- Percentile
- 91.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench-v1; provider: Anthropic.
91.7% percentile inside its fair comparison set62.1%Raw benchmark valueCI 59.6% - 64.6%