LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #51 · Source label: grok/grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 80.6%
- Percentile
- 44.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: xAI.
44.4% percentile inside its fair comparison set80.6%Raw benchmark valueCI 79.7% - 81.5%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #7 · Source label: grok/grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 75.7%
- Percentile
- 93.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: xAI.
93.4% percentile inside its fair comparison set75.7%Raw benchmark valueCI 74.2% - 77.2%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #35 · Source label: grok/grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 37.4%
- Percentile
- 33.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.
33.3% percentile inside its fair comparison set37.4%Raw benchmark valueCI 33.6% - 41.2%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #17 · Source label: grok/grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 81.6%
- Percentile
- 68%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: xAI.
68% percentile inside its fair comparison set81.6%Raw benchmark valueCI 77.4% - 85.8%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #89 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,422
- Percentile
- 68%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: expert. Source rank: #109. Votes: 298. Organization: xai. License: Proprietary.
68% percentile inside its fair comparison set1,422Raw benchmark valueCI 1,388 - 1,456
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #96 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 70.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_business_and_management_and_financial_operations. Source rank: #116. Votes: 1201. Organization: xai. License: Proprietary.
70.1% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,385 - 1,419
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #76 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,385
- Percentile
- 76.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_entertainment_and_sports_and_media. Source rank: #95. Votes: 1277. Organization: xai. License: Proprietary.
76.8% percentile inside its fair comparison set1,385Raw benchmark valueCI 1,369 - 1,402
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #72 · Source label: grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,425
- Percentile
- 76.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-reasoning`. Category: industry_legal_and_government. Source rank: #92. Votes: 1187. Organization: xai. License: Proprietary.
76.2% percentile inside its fair comparison set1,425Raw benchmark valueCI 1,408 - 1,442
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #79 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,433
- Percentile
- 75.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_life_and_physical_and_social_science. Source rank: #97. Votes: 1004. Organization: xai. License: Proprietary.
75.9% percentile inside its fair comparison set1,433Raw benchmark valueCI 1,415 - 1,452
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #50 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,443
- Percentile
- 84.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_mathematical. Source rank: #62. Votes: 319. Organization: xai. License: Proprietary.
84.1% percentile inside its fair comparison set1,443Raw benchmark valueCI 1,411 - 1,475
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #69 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,443
- Percentile
- 76.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_medicine_and_healthcare. Source rank: #86. Votes: 392. Organization: xai. License: Proprietary.
76.9% percentile inside its fair comparison set1,443Raw benchmark valueCI 1,413 - 1,473
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #64 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 80.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_software_and_it_services. Source rank: #80. Votes: 2251. Organization: xai. License: Proprietary.
80.6% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,447 - 1,472
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #81 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,394
- Percentile
- 75.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_writing_and_literature_and_language. Source rank: #101. Votes: 1527. Organization: xai. License: Proprietary.
75.3% percentile inside its fair comparison set1,394Raw benchmark valueCI 1,379 - 1,409
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #84 · Source label: grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,407
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-reasoning`. Category: expert. Source rank: #102. Votes: 862. Organization: xai. License: Proprietary.
69.8% percentile inside its fair comparison set1,407Raw benchmark valueCI 1,387 - 1,427
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #92 · Source label: grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,393
- Percentile
- 71.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-reasoning`. Category: industry_business_and_management_and_financial_operations. Source rank: #110. Votes: 3651. Organization: xai. License: Proprietary.
71.4% percentile inside its fair comparison set1,393Raw benchmark valueCI 1,383 - 1,402
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #72 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,383
- Percentile
- 78%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_entertainment_and_sports_and_media. Source rank: #87. Votes: 1277. Organization: xai. License: Proprietary.
78% percentile inside its fair comparison set1,383Raw benchmark valueCI 1,366 - 1,399
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #67 · Source label: grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 77.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-reasoning`. Category: industry_legal_and_government. Source rank: #81. Votes: 1187. Organization: xai. License: Proprietary.
77.9% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,411 - 1,444
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #83 · Source label: grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 74.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-reasoning`. Category: industry_life_and_physical_and_social_science. Source rank: #97. Votes: 2985. Organization: xai. License: Proprietary.
74.6% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,413 - 1,435
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #59 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,432
- Percentile
- 81.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_mathematical. Source rank: #69. Votes: 319. Organization: xai. License: Proprietary.
81.2% percentile inside its fair comparison set1,432Raw benchmark valueCI 1,401 - 1,464
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #80 · Source label: grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 73.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-reasoning`. Category: industry_medicine_and_healthcare. Source rank: #96. Votes: 876. Organization: xai. License: Proprietary.
73.2% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,400 - 1,440
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #76 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,434
- Percentile
- 76.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_software_and_it_services. Source rank: #91. Votes: 2251. Organization: xai. License: Proprietary.
76.9% percentile inside its fair comparison set1,434Raw benchmark valueCI 1,422 - 1,447
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #73 · Source label: grok-4-fast-chat
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,392
- Percentile
- 77.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-fast-chat`. Category: industry_writing_and_literature_and_language. Source rank: #88. Votes: 1527. Organization: xai. License: Proprietary.
77.8% percentile inside its fair comparison set1,392Raw benchmark valueCI 1,376 - 1,407
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #43 · Source label: grok/grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 29.8%
- Percentile
- 6.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: xAI.
6.7% percentile inside its fair comparison set29.8%Raw benchmark valueCI 23.8% - 35.7%
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #11 · Source label: grok/grok-4-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,034.3 score
- Percentile
- 37.5%
- Last updated
- archived
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
37.5% percentile inside its fair comparison set1,034.3 scoreRaw benchmark valueCI 1,034.3 score - 1,034.3 score