GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #21 · Source label: Muse Spark
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,152
- Percentile
- 56.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
56.5% percentile inside its fair comparison set1,152Raw benchmark value
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #21 · Source label: meta/muse_spark
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 84.2%
- Percentile
- 77.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Meta.
77.8% percentile inside its fair comparison set84.2%Raw benchmark valueCI 83.4% - 85%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #1 · Source label: meta/muse_spark
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 77.7%
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Meta.
100% percentile inside its fair comparison set77.7%Raw benchmark valueCI 76.1% - 79.3%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #9 · Source label: meta/muse_spark
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.3%
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Meta.
84.3% percentile inside its fair comparison set51.3%Raw benchmark valueCI 46.9% - 55.7%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #6 · Source label: meta/muse_spark
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 85.9%
- Percentile
- 90%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Meta.
90% percentile inside its fair comparison set85.9%Raw benchmark valueCI 82.3% - 89.5%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #24
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 91.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: expert. Source rank: #30. Votes: 1271. Organization: meta. License: Proprietary.
91.6% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,470 - 1,504
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #6
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,488
- Percentile
- 98.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_business_and_management_and_financial_operations. Source rank: #8. Votes: 2736. Organization: meta. License: Proprietary.
98.4% percentile inside its fair comparison set1,488Raw benchmark valueCI 1,476 - 1,500
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #6
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 98.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_entertainment_and_sports_and_media. Source rank: #8. Votes: 2518. Organization: meta. License: Proprietary.
98.5% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,448 - 1,473
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,504
- Percentile
- 98.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_legal_and_government. Source rank: #6. Votes: 963. Organization: meta. License: Proprietary.
98.7% percentile inside its fair comparison set1,504Raw benchmark valueCI 1,484 - 1,524
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #9
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,498
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_life_and_physical_and_social_science. Source rank: #11. Votes: 2268. Organization: meta. License: Proprietary.
97.5% percentile inside its fair comparison set1,498Raw benchmark valueCI 1,485 - 1,511
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #27
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 91.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_mathematical. Source rank: #33. Votes: 812. Organization: meta. License: Proprietary.
91.6% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,449 - 1,491
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,506
- Percentile
- 98.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_medicine_and_healthcare. Source rank: #7. Votes: 967. Organization: meta. License: Proprietary.
98.6% percentile inside its fair comparison set1,506Raw benchmark valueCI 1,486 - 1,526
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #6
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,518
- Percentile
- 98.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_software_and_it_services. Source rank: #8. Votes: 5323. Organization: meta. License: Proprietary.
98.5% percentile inside its fair comparison set1,518Raw benchmark valueCI 1,510 - 1,527
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #16
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,458
- Percentile
- 95.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_writing_and_literature_and_language. Source rank: #21. Votes: 3138. Organization: meta. License: Proprietary.
95.4% percentile inside its fair comparison set1,458Raw benchmark valueCI 1,447 - 1,469
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #35
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 87.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: expert. Source rank: #43. Votes: 1271. Organization: meta. License: Proprietary.
87.6% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,437 - 1,471
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #11
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_business_and_management_and_financial_operations. Source rank: #14. Votes: 2736. Organization: meta. License: Proprietary.
96.9% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,449 - 1,473
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #10
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 97.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_entertainment_and_sports_and_media. Source rank: #12. Votes: 2518. Organization: meta. License: Proprietary.
97.2% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,436 - 1,460
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #5
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,490
- Percentile
- 98.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_legal_and_government. Source rank: #6. Votes: 963. Organization: meta. License: Proprietary.
98.7% percentile inside its fair comparison set1,490Raw benchmark valueCI 1,470 - 1,509
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_life_and_physical_and_social_science. Source rank: #18. Votes: 2268. Organization: meta. License: Proprietary.
96% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,465 - 1,491
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #31
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,453
- Percentile
- 90.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_mathematical. Source rank: #38. Votes: 812. Organization: meta. License: Proprietary.
90.3% percentile inside its fair comparison set1,453Raw benchmark valueCI 1,432 - 1,474
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #8
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,484
- Percentile
- 97.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_medicine_and_healthcare. Source rank: #10. Votes: 967. Organization: meta. License: Proprietary.
97.6% percentile inside its fair comparison set1,484Raw benchmark valueCI 1,464 - 1,503
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #17
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,483
- Percentile
- 95.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_software_and_it_services. Source rank: #20. Votes: 5323. Organization: meta. License: Proprietary.
95.1% percentile inside its fair comparison set1,483Raw benchmark valueCI 1,475 - 1,492
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #16
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 95.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `muse-spark`. Category: industry_writing_and_literature_and_language. Source rank: #20. Votes: 3138. Organization: meta. License: Proprietary.
95.4% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,439 - 1,461