GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #1 · Source label: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,771
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
100% percentile inside its fair comparison set1,771Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #1 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 75
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Anthropic.
100% percentile inside its fair comparison set75Raw benchmark valueCI 74 - 76
Harvey's Legal Agent Benchmark
VALS-AI · Professional reasoning · Objective
Completing legal work with documents, spreadsheets, presentations, and file-system tools.
Rank #1 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 11.3%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: Anthropic.
100% percentile inside its fair comparison set11.3%Raw benchmark valueCI 7% - 15.5%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #1 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 88.6%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Anthropic.
100% percentile inside its fair comparison set88.6%Raw benchmark valueCI 87.9% - 89.2%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #2 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 56.3%
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Anthropic.
96% percentile inside its fair comparison set56.3%Raw benchmark valueCI 54.7% - 58%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #3 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 76.9%
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Anthropic.
97.8% percentile inside its fair comparison set76.9%Raw benchmark valueCI 75.3% - 78.6%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #2 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 56.1%
- Percentile
- 98%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.
98% percentile inside its fair comparison set56.1%Raw benchmark valueCI 51.8% - 60.4%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #1 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 88.5%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Anthropic.
100% percentile inside its fair comparison set88.5%Raw benchmark valueCI 84.7% - 92.3%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,536
- Percentile
- 99.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: expert. Source rank: #3. Votes: 434. Organization: anthropic. License: Proprietary.
99.6% percentile inside its fair comparison set1,536Raw benchmark valueCI 1,507 - 1,564
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,502
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_business_and_management_and_financial_operations. Source rank: #2. Votes: 906. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,502Raw benchmark valueCI 1,482 - 1,522
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,490
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_entertainment_and_sports_and_media. Source rank: #2. Votes: 1014. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,490Raw benchmark valueCI 1,471 - 1,509
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,523
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_legal_and_government. Source rank: #1. Votes: 329. Organization: anthropic. License: Proprietary.
100% percentile inside its fair comparison set1,523Raw benchmark valueCI 1,491 - 1,555
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,518
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_life_and_physical_and_social_science. Source rank: #4. Votes: 727. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,518Raw benchmark valueCI 1,495 - 1,540
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,517
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_mathematical. Source rank: #2. Votes: 242. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,517Raw benchmark valueCI 1,479 - 1,554
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #10
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,499
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_medicine_and_healthcare. Source rank: #12. Votes: 302. Organization: anthropic. License: Proprietary.
96.9% percentile inside its fair comparison set1,499Raw benchmark valueCI 1,464 - 1,533
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,544
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_software_and_it_services. Source rank: #1. Votes: 1587. Organization: anthropic. License: Proprietary.
100% percentile inside its fair comparison set1,544Raw benchmark valueCI 1,528 - 1,559
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,515
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_writing_and_literature_and_language. Source rank: #1. Votes: 1146. Organization: anthropic. License: Proprietary.
100% percentile inside its fair comparison set1,515Raw benchmark valueCI 1,497 - 1,533
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,520
- Percentile
- 99.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: expert. Source rank: #3. Votes: 434. Organization: anthropic. License: Proprietary.
99.6% percentile inside its fair comparison set1,520Raw benchmark valueCI 1,492 - 1,548
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,481
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_business_and_management_and_financial_operations. Source rank: #4. Votes: 906. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,481Raw benchmark valueCI 1,461 - 1,501
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,476
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_entertainment_and_sports_and_media. Source rank: #3. Votes: 1014. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,476Raw benchmark valueCI 1,457 - 1,495
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,509
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_legal_and_government. Source rank: #3. Votes: 329. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,509Raw benchmark valueCI 1,477 - 1,542
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #7
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,495
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_life_and_physical_and_social_science. Source rank: #8. Votes: 727. Organization: anthropic. License: Proprietary.
98.1% percentile inside its fair comparison set1,495Raw benchmark valueCI 1,473 - 1,517
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,510
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_mathematical. Source rank: #4. Votes: 242. Organization: anthropic. License: Proprietary.
99.4% percentile inside its fair comparison set1,510Raw benchmark valueCI 1,472 - 1,548
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #16
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,469
- Percentile
- 94.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_medicine_and_healthcare. Source rank: #19. Votes: 302. Organization: anthropic. License: Proprietary.
94.9% percentile inside its fair comparison set1,469Raw benchmark valueCI 1,435 - 1,503
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #2
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,517
- Percentile
- 99.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_software_and_it_services. Source rank: #3. Votes: 1587. Organization: anthropic. License: Proprietary.
99.7% percentile inside its fair comparison set1,517Raw benchmark valueCI 1,502 - 1,532
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,506
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-fable-5`. Category: industry_writing_and_literature_and_language. Source rank: #1. Votes: 1146. Organization: anthropic. License: Proprietary.
100% percentile inside its fair comparison set1,506Raw benchmark valueCI 1,487 - 1,524
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #5 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.9%
- Percentile
- 91.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Anthropic.
91.1% percentile inside its fair comparison set51.9%Raw benchmark valueCI 45.2% - 58.6%
Public Benefits Bench v1
VALS-AI · Professional reasoning · Objective
Answering public-benefits questions across the benefits lifecycle.
Rank #1 · Source label: anthropic/claude-fable-5
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 71.7%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench-v1; provider: Anthropic.
100% percentile inside its fair comparison set71.7%Raw benchmark valueCI 69.4% - 73.9%