Public Benefits Bench
VALS-AI · Professional reasoning · Objective
Answering SNAP benefits questions across the public-benefits lifecycle.
Rank #9 · Source label: anthropic/claude-haiku-4-5-20251001
verified runtimevariant direct
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 54.3%
- Percentile
- 27.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench; provider: Anthropic.
27.3% percentile inside its fair comparison set54.3%Raw benchmark valueCI 51.7% - 56.8%
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #24 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 41
- Percentile
- 11.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Anthropic.
11.5% percentile inside its fair comparison set41Raw benchmark valueCI 39 - 43
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #49 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 81.2%
- Percentile
- 46.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Anthropic.
46.7% percentile inside its fair comparison set81.2%Raw benchmark valueCI 80.3% - 82.2%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #23 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 31%
- Percentile
- 12%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Anthropic.
12% percentile inside its fair comparison set31%Raw benchmark valueCI 29.9% - 32.1%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #70 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 67.5%
- Percentile
- 24.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Anthropic.
24.2% percentile inside its fair comparison set67.5%Raw benchmark valueCI 65.7% - 69.3%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #46 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 32.7%
- Percentile
- 11.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.
11.8% percentile inside its fair comparison set32.7%Raw benchmark valueCI 28.8% - 36.6%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #9 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 85.2%
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Anthropic.
84% percentile inside its fair comparison set85.2%Raw benchmark valueCI 81.5% - 89%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #53 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 81.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: expert. Source rank: #69. Votes: 6618. Organization: anthropic. License: Proprietary.
81.1% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,447 - 1,464
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #71 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 78%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_business_and_management_and_financial_operations. Source rank: #88. Votes: 17395. Organization: anthropic. License: Proprietary.
78% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,411 - 1,422
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #78 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,384
- Percentile
- 76.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_entertainment_and_sports_and_media. Source rank: #97. Votes: 17599. Organization: anthropic. License: Proprietary.
76.2% percentile inside its fair comparison set1,384Raw benchmark valueCI 1,379 - 1,390
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #89 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,414
- Percentile
- 70.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_legal_and_government. Source rank: #111. Votes: 6567. Organization: anthropic. License: Proprietary.
70.5% percentile inside its fair comparison set1,414Raw benchmark valueCI 1,406 - 1,422
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #82 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,429
- Percentile
- 74.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_life_and_physical_and_social_science. Source rank: #104. Votes: 14574. Organization: anthropic. License: Proprietary.
74.9% percentile inside its fair comparison set1,429Raw benchmark valueCI 1,424 - 1,435
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #70 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,429
- Percentile
- 77.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_mathematical. Source rank: #85. Votes: 4472. Organization: anthropic. License: Proprietary.
77.6% percentile inside its fair comparison set1,429Raw benchmark valueCI 1,420 - 1,439
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #101 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 66.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_medicine_and_healthcare. Source rank: #123. Votes: 5734. Organization: anthropic. License: Proprietary.
66.1% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,411 - 1,429
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #62 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,462
- Percentile
- 81.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_software_and_it_services. Source rank: #78. Votes: 33588. Organization: anthropic. License: Proprietary.
81.2% percentile inside its fair comparison set1,462Raw benchmark valueCI 1,457 - 1,466
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #73 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,398
- Percentile
- 77.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_writing_and_literature_and_language. Source rank: #91. Votes: 20720. Organization: anthropic. License: Proprietary.
77.8% percentile inside its fair comparison set1,398Raw benchmark valueCI 1,393 - 1,403
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #46 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,443
- Percentile
- 83.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: expert. Source rank: #54. Votes: 6618. Organization: anthropic. License: Proprietary.
83.6% percentile inside its fair comparison set1,443Raw benchmark valueCI 1,435 - 1,451
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #87 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,395
- Percentile
- 73%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_business_and_management_and_financial_operations. Source rank: #105. Votes: 17395. Organization: anthropic. License: Proprietary.
73% percentile inside its fair comparison set1,395Raw benchmark valueCI 1,390 - 1,401
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #90 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,363
- Percentile
- 72.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_entertainment_and_sports_and_media. Source rank: #110. Votes: 17599. Organization: anthropic. License: Proprietary.
72.4% percentile inside its fair comparison set1,363Raw benchmark valueCI 1,358 - 1,369
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #100 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 66.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_legal_and_government. Source rank: #121. Votes: 6567. Organization: anthropic. License: Proprietary.
66.8% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,388 - 1,404
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #104 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,401
- Percentile
- 68.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_life_and_physical_and_social_science. Source rank: #123. Votes: 14574. Organization: anthropic. License: Proprietary.
68.1% percentile inside its fair comparison set1,401Raw benchmark valueCI 1,396 - 1,407
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #71 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,422
- Percentile
- 77.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_mathematical. Source rank: #84. Votes: 4472. Organization: anthropic. License: Proprietary.
77.3% percentile inside its fair comparison set1,422Raw benchmark valueCI 1,413 - 1,432
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #111 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,386
- Percentile
- 62.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_medicine_and_healthcare. Source rank: #134. Votes: 5734. Organization: anthropic. License: Proprietary.
62.7% percentile inside its fair comparison set1,386Raw benchmark valueCI 1,377 - 1,394
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #75 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,435
- Percentile
- 77.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_software_and_it_services. Source rank: #90. Votes: 33588. Organization: anthropic. License: Proprietary.
77.2% percentile inside its fair comparison set1,435Raw benchmark valueCI 1,431 - 1,439
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #85 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,384
- Percentile
- 74.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `claude-haiku-4-5-20251001`. Category: industry_writing_and_literature_and_language. Source rank: #103. Votes: 20720. Organization: anthropic. License: Proprietary.
74.1% percentile inside its fair comparison set1,384Raw benchmark valueCI 1,379 - 1,389
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #37 · Source label: anthropic/claude-haiku-4-5-20251001-thinking
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 31.8%
- Percentile
- 20%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Anthropic.
20% percentile inside its fair comparison set31.8%Raw benchmark valueCI 25.7% - 38%
Public Benefits Bench v1
VALS-AI · Professional reasoning · Objective
Answering public-benefits questions across the benefits lifecycle.
Rank #12 · Source label: anthropic/claude-haiku-4-5-20251001
verified runtimevariant direct
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 49.5%
- Percentile
- 8.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench-v1; provider: Anthropic.
8.3% percentile inside its fair comparison set49.5%Raw benchmark valueCI 47% - 52.1%
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #88 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 45.1%
- Percentile
- 19.4%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
19.4% percentile inside its fair comparison set45.1%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #93 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 45.3%
- Percentile
- 14.8%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
14.8% percentile inside its fair comparison set45.3%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #90 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 3%
- Percentile
- 17.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
17.6% percentile inside its fair comparison set3%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #75 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 38.3%
- Percentile
- 31.5%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
31.5% percentile inside its fair comparison set38.3%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #96 · Source label: claude-haiku-4-5-20251001
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 94.1%
- Percentile
- 14.8%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
14.8% percentile inside its fair comparison set94.1%Raw benchmark value