Public Benefits Bench
VALS-AI · Professional reasoning · Objective
Answering SNAP benefits questions across the public-benefits lifecycle.
Rank #12 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 44.8%
- Percentile
- 0%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench; provider: xAI.
0% percentile inside its fair comparison set44.8%Raw benchmark valueCI 42.3% - 47.3%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #41 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 82.5%
- Percentile
- 55.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: xAI.
55.6% percentile inside its fair comparison set82.5%Raw benchmark valueCI 81.6% - 83.3%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #34 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 73.1%
- Percentile
- 63.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: xAI.
63.7% percentile inside its fair comparison set73.1%Raw benchmark valueCI 71.4% - 74.8%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #51 · Source label: grok/grok-4-1-fast-non-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 28.3%
- Percentile
- 2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.
2% percentile inside its fair comparison set28.3%Raw benchmark valueCI 24.6% - 32.1%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #22 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 78.7%
- Percentile
- 58%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: xAI.
58% percentile inside its fair comparison set78.7%Raw benchmark valueCI 75.1% - 82.4%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #72 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,441
- Percentile
- 74.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: expert. Source rank: #90. Votes: 4069. Organization: xai. License: Proprietary.
74.2% percentile inside its fair comparison set1,441Raw benchmark valueCI 1,431 - 1,450
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #69 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,418
- Percentile
- 78.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_business_and_management_and_financial_operations. Source rank: #86. Votes: 10853. Organization: xai. License: Proprietary.
78.6% percentile inside its fair comparison set1,418Raw benchmark valueCI 1,411 - 1,424
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #56 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,403
- Percentile
- 83%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_entertainment_and_sports_and_media. Source rank: #69. Votes: 10906. Organization: xai. License: Proprietary.
83% percentile inside its fair comparison set1,403Raw benchmark valueCI 1,396 - 1,409
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #71 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 76.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_legal_and_government. Source rank: #91. Votes: 4089. Organization: xai. License: Proprietary.
76.5% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,417 - 1,437
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #63 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 80.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_life_and_physical_and_social_science. Source rank: #79. Votes: 9143. Organization: xai. License: Proprietary.
80.8% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,440 - 1,454
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #75 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 76%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_mathematical. Source rank: #92. Votes: 2774. Organization: xai. License: Proprietary.
76% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,412 - 1,436
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #61 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 79.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_medicine_and_healthcare. Source rank: #78. Votes: 3687. Organization: xai. License: Proprietary.
79.7% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,436 - 1,457
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #67 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,458
- Percentile
- 79.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_software_and_it_services. Source rank: #83. Votes: 20577. Organization: xai. License: Proprietary.
79.7% percentile inside its fair comparison set1,458Raw benchmark valueCI 1,453 - 1,463
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #60 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_writing_and_literature_and_language. Source rank: #77. Votes: 13001. Organization: xai. License: Proprietary.
81.8% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,401 - 1,412
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #89 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 68%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: expert. Source rank: #109. Votes: 4069. Organization: xai. License: Proprietary.
68% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,392 - 1,411
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #102 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,385
- Percentile
- 68.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_business_and_management_and_financial_operations. Source rank: #121. Votes: 10853. Organization: xai. License: Proprietary.
68.2% percentile inside its fair comparison set1,385Raw benchmark valueCI 1,379 - 1,392
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #73 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,382
- Percentile
- 77.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_entertainment_and_sports_and_media. Source rank: #88. Votes: 10906. Organization: xai. License: Proprietary.
77.7% percentile inside its fair comparison set1,382Raw benchmark valueCI 1,375 - 1,388
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #95 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 68.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_legal_and_government. Source rank: #116. Votes: 4089. Organization: xai. License: Proprietary.
68.5% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,392 - 1,412
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #88 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 73.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_life_and_physical_and_social_science. Source rank: #104. Votes: 9143. Organization: xai. License: Proprietary.
73.1% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,413 - 1,427
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #96 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,404
- Percentile
- 69.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_mathematical. Source rank: #114. Votes: 2774. Organization: xai. License: Proprietary.
69.2% percentile inside its fair comparison set1,404Raw benchmark valueCI 1,393 - 1,416
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #87 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,414
- Percentile
- 70.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_medicine_and_healthcare. Source rank: #104. Votes: 3687. Organization: xai. License: Proprietary.
70.8% percentile inside its fair comparison set1,414Raw benchmark valueCI 1,404 - 1,424
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #92 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 72%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_software_and_it_services. Source rank: #110. Votes: 20577. Organization: xai. License: Proprietary.
72% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,414 - 1,424
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #81 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,386
- Percentile
- 75.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4-1-fast-reasoning`. Category: industry_writing_and_literature_and_language. Source rank: #99. Votes: 13001. Organization: xai. License: Proprietary.
75.3% percentile inside its fair comparison set1,386Raw benchmark valueCI 1,380 - 1,392
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #40 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 31.3%
- Percentile
- 13.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: xAI.
13.3% percentile inside its fair comparison set31.3%Raw benchmark valueCI 25.4% - 37.3%
Public Benefits Bench v1
VALS-AI · Professional reasoning · Objective
Answering public-benefits questions across the benefits lifecycle.
Rank #13 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 44.3%
- Percentile
- 0%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench-v1; provider: xAI.
0% percentile inside its fair comparison set44.3%Raw benchmark valueCI 41.8% - 46.8%
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #102 · Source label: grok-4-1-fast-non-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 40.6%
- Percentile
- 6.5%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
6.5% percentile inside its fair comparison set40.6%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #106 · Source label: grok-4-1-fast-non-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 33.4%
- Percentile
- 2.8%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
2.8% percentile inside its fair comparison set33.4%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #81 · Source label: grok-4-1-fast-non-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 5.3%
- Percentile
- 25.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
25.9% percentile inside its fair comparison set5.3%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #104 · Source label: grok-4-1-fast-non-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 26.3%
- Percentile
- 4.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
4.6% percentile inside its fair comparison set26.3%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #104 · Source label: grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 88.2%
- Percentile
- 4.6%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
4.6% percentile inside its fair comparison set88.2%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #7 · Source label: grok/grok-4-1-fast-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,079.2 score
- Percentile
- 62.5%
- Last updated
- archived
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
62.5% percentile inside its fair comparison set1,079.2 scoreRaw benchmark valueCI 1,079.2 score - 1,079.2 score