GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #4 · Source label: Claude Opus 4.7 (Adaptive Reasoning, Max Effort)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,507
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
93.5% percentile inside its fair comparison set1,507Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #4 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 66
- Percentile
- 88.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Anthropic.
88.5% percentile inside its fair comparison set66Raw benchmark valueCI 64 - 69
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #11 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 85.3%
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Anthropic.
88.9% percentile inside its fair comparison set85.3%Raw benchmark valueCI 84.4% - 86.1%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #5 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.5%
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Anthropic.
84% percentile inside its fair comparison set51.5%Raw benchmark valueCI 50.5% - 52.5%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #10 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 75.3%
- Percentile
- 90.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Anthropic.
90.1% percentile inside its fair comparison set75.3%Raw benchmark valueCI 73.6% - 76.9%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #5 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 54.9%
- Percentile
- 92.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.
92.2% percentile inside its fair comparison set54.9%Raw benchmark valueCI 50.5% - 59.2%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #16 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 83%
- Percentile
- 70%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Anthropic.
70% percentile inside its fair comparison set83%Raw benchmark valueCI 79.1% - 86.8%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,533
- Percentile
- 99.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7`. Category: expert. Source rank: #4. Votes: 3228. Organization: anthropic. License: Proprietary.
99.3% percentile inside its fair comparison set1,533Raw benchmark valueCI 1,521 - 1,544
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #4 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,500
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_business_and_management_and_financial_operations. Source rank: #5. Votes: 6482. Organization: anthropic. License: Proprietary.
99.1% percentile inside its fair comparison set1,500Raw benchmark valueCI 1,492 - 1,508
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,480
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_entertainment_and_sports_and_media. Source rank: #3. Votes: 7042. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,480Raw benchmark valueCI 1,472 - 1,489
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #4 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,505
- Percentile
- 99%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_legal_and_government. Source rank: #5. Votes: 2601. Organization: anthropic. License: Proprietary.
99% percentile inside its fair comparison set1,505Raw benchmark valueCI 1,492 - 1,518
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #1 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,527
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_life_and_physical_and_social_science. Source rank: #1. Votes: 5573. Organization: anthropic. License: Proprietary.
100% percentile inside its fair comparison set1,527Raw benchmark valueCI 1,518 - 1,536
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,515
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_mathematical. Source rank: #4. Votes: 1889. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,515Raw benchmark valueCI 1,500 - 1,530
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #1 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,521
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_medicine_and_healthcare. Source rank: #1. Votes: 2450. Organization: anthropic. License: Proprietary.
100% percentile inside its fair comparison set1,521Raw benchmark valueCI 1,508 - 1,534
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #2 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,543
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_software_and_it_services. Source rank: #2. Votes: 13041. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,543Raw benchmark valueCI 1,536 - 1,549
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,495
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_writing_and_literature_and_language. Source rank: #3. Votes: 7935. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,495Raw benchmark valueCI 1,487 - 1,502
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,516
- Percentile
- 99.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7`. Category: expert. Source rank: #4. Votes: 3228. Organization: anthropic. License: Proprietary.
99.3% percentile inside its fair comparison set1,516Raw benchmark valueCI 1,505 - 1,528
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #2 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,481
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_business_and_management_and_financial_operations. Source rank: #3. Votes: 6482. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,481Raw benchmark valueCI 1,473 - 1,490
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_entertainment_and_sports_and_media. Source rank: #4. Votes: 7042. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,464 - 1,480
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,497
- Percentile
- 99.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_legal_and_government. Source rank: #4. Votes: 2601. Organization: anthropic. License: Proprietary.
99.3% percentile inside its fair comparison set1,497Raw benchmark valueCI 1,484 - 1,509
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #2 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,507
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_life_and_physical_and_social_science. Source rank: #3. Votes: 5573. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,507Raw benchmark valueCI 1,499 - 1,516
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #4 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,506
- Percentile
- 99%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_mathematical. Source rank: #5. Votes: 1889. Organization: anthropic. License: Proprietary.
99% percentile inside its fair comparison set1,506Raw benchmark valueCI 1,492 - 1,521
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,496
- Percentile
- 99.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_medicine_and_healthcare. Source rank: #4. Votes: 2450. Organization: anthropic. License: Proprietary.
99.3% percentile inside its fair comparison set1,496Raw benchmark valueCI 1,483 - 1,508
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,516
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_software_and_it_services. Source rank: #4. Votes: 13041. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,516Raw benchmark valueCI 1,510 - 1,523
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #3 · Source label: claude-opus-4-7-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,488
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-opus-4-7-thinking`. Category: industry_writing_and_literature_and_language. Source rank: #4. Votes: 7935. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,488Raw benchmark valueCI 1,480 - 1,496
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #1 · Source label: anthropic/claude-opus-4-7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 56.1%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Anthropic.
100% percentile inside its fair comparison set56.1%Raw benchmark valueCI 49.5% - 62.7%