GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #23 · Source label: Grok 4.3 (high)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,091
- Percentile
- 52.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
52.2% percentile inside its fair comparison set1,091Raw benchmark value
APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #13 · Source label: Grok 4.3 (high)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 17%
- Percentile
- 50%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
50% percentile inside its fair comparison set17%Raw benchmark value
Legal Research Bench
VALS-AI · Professional reasoning · Objective
Applied legal research tasks.
Rank #12 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 15.4%
- Percentile
- 8.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: xAI.
8.3% percentile inside its fair comparison set15.4%Raw benchmark valueCI 10.5% - 20.3%
SkillsBench
VALS-AI · Professional reasoning · Objective
Applied professional skills tasks.
Rank #11 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 40.6%
- Percentile
- 0%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: xAI.
0% percentile inside its fair comparison set40.6%Raw benchmark valueCI 31.6% - 49.7%
Public Benefits Bench
VALS-AI · Professional reasoning · Objective
Answering SNAP benefits questions across the public-benefits lifecycle.
Rank #11 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.7%
- Percentile
- 9.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench; provider: xAI.
9.1% percentile inside its fair comparison set51.7%Raw benchmark valueCI 49.1% - 54.2%
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #21 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 46
- Percentile
- 23.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: xAI.
23.1% percentile inside its fair comparison set46Raw benchmark valueCI 44 - 49
Harvey's Legal Agent Benchmark
VALS-AI · Professional reasoning · Objective
Completing legal work with documents, spreadsheets, presentations, and file-system tools.
Rank #11 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 0.4%
- Percentile
- 23.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: xAI.
23.1% percentile inside its fair comparison set0.4%Raw benchmark valueCI 0.4% - 0.4%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #16 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 84.5%
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: xAI.
83.3% percentile inside its fair comparison set84.5%Raw benchmark valueCI 83.6% - 85.3%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #19 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 37.7%
- Percentile
- 28%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: xAI.
28% percentile inside its fair comparison set37.7%Raw benchmark valueCI 37% - 38.5%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #57 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 70.8%
- Percentile
- 38.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: xAI.
38.5% percentile inside its fair comparison set70.8%Raw benchmark valueCI 69.1% - 72.6%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #34 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 38.1%
- Percentile
- 35.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.
35.3% percentile inside its fair comparison set38.1%Raw benchmark valueCI 34% - 42.1%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #34 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 74.4%
- Percentile
- 34%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: xAI.
34% percentile inside its fair comparison set74.4%Raw benchmark valueCI 70.4% - 78.4%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #65 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 76.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: expert. Source rank: #83. Votes: 2601. Organization: xai. License: Proprietary.
76.7% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,433 - 1,458
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #38 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 88.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_business_and_management_and_financial_operations. Source rank: #49. Votes: 5628. Organization: xai. License: Proprietary.
88.4% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,439 - 1,457
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #36 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 89.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_entertainment_and_sports_and_media. Source rank: #48. Votes: 6313. Organization: xai. License: Proprietary.
89.2% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,416 - 1,433
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #54 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,441
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_legal_and_government. Source rank: #72. Votes: 2232. Organization: xai. License: Proprietary.
82.2% percentile inside its fair comparison set1,441Raw benchmark valueCI 1,428 - 1,455
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #43 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 87%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_life_and_physical_and_social_science. Source rank: #55. Votes: 4709. Organization: xai. License: Proprietary.
87% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,450 - 1,469
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #64 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,437
- Percentile
- 79.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_mathematical. Source rank: #78. Votes: 1505. Organization: xai. License: Proprietary.
79.5% percentile inside its fair comparison set1,437Raw benchmark valueCI 1,421 - 1,453
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #48 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,463
- Percentile
- 84.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_medicine_and_healthcare. Source rank: #60. Votes: 2096. Organization: xai. License: Proprietary.
84.1% percentile inside its fair comparison set1,463Raw benchmark valueCI 1,449 - 1,476
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #44 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,480
- Percentile
- 86.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_software_and_it_services. Source rank: #56. Votes: 11278. Organization: xai. License: Proprietary.
86.8% percentile inside its fair comparison set1,480Raw benchmark valueCI 1,473 - 1,486
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #36 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 89.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_writing_and_literature_and_language. Source rank: #47. Votes: 6917. Organization: xai. License: Proprietary.
89.2% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,422 - 1,438
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #102 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,384
- Percentile
- 63.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: expert. Source rank: #124. Votes: 2601. Organization: xai. License: Proprietary.
63.3% percentile inside its fair comparison set1,384Raw benchmark valueCI 1,372 - 1,397
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #91 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,394
- Percentile
- 71.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_business_and_management_and_financial_operations. Source rank: #109. Votes: 5628. Organization: xai. License: Proprietary.
71.7% percentile inside its fair comparison set1,394Raw benchmark valueCI 1,385 - 1,402
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #71 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,383
- Percentile
- 78.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_entertainment_and_sports_and_media. Source rank: #86. Votes: 6313. Organization: xai. License: Proprietary.
78.3% percentile inside its fair comparison set1,383Raw benchmark valueCI 1,375 - 1,392
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #102 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,394
- Percentile
- 66.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_legal_and_government. Source rank: #123. Votes: 2232. Organization: xai. License: Proprietary.
66.1% percentile inside its fair comparison set1,394Raw benchmark valueCI 1,381 - 1,408
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #103 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,405
- Percentile
- 68.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_life_and_physical_and_social_science. Source rank: #122. Votes: 4709. Organization: xai. License: Proprietary.
68.4% percentile inside its fair comparison set1,405Raw benchmark valueCI 1,396 - 1,414
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #95 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,405
- Percentile
- 69.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_mathematical. Source rank: #113. Votes: 1505. Organization: xai. License: Proprietary.
69.5% percentile inside its fair comparison set1,405Raw benchmark valueCI 1,389 - 1,421
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #100 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 66.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_medicine_and_healthcare. Source rank: #121. Votes: 2096. Organization: xai. License: Proprietary.
66.4% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,388 - 1,415
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #90 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,421
- Percentile
- 72.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_software_and_it_services. Source rank: #107. Votes: 11278. Organization: xai. License: Proprietary.
72.6% percentile inside its fair comparison set1,421Raw benchmark valueCI 1,414 - 1,427
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #68 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,395
- Percentile
- 79.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.3`. Category: industry_writing_and_literature_and_language. Source rank: #81. Votes: 6917. Organization: xai. License: Proprietary.
79.3% percentile inside its fair comparison set1,395Raw benchmark valueCI 1,387 - 1,403
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #46 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 19.7%
- Percentile
- 0%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: xAI.
0% percentile inside its fair comparison set19.7%Raw benchmark valueCI 14.5% - 24.9%
Public Benefits Bench v1
VALS-AI · Professional reasoning · Objective
Answering public-benefits questions across the benefits lifecycle.
Rank #11 · Source label: grok/grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 50.1%
- Percentile
- 16.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: public-benefits-bench-v1; provider: xAI.
16.7% percentile inside its fair comparison set50.1%Raw benchmark valueCI 47.5% - 52.6%
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #53 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 55.8%
- Percentile
- 51.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
51.9% percentile inside its fair comparison set55.8%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #46 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 66.7%
- Percentile
- 58.3%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
58.3% percentile inside its fair comparison set66.7%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #51 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 28%
- Percentile
- 53.7%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
53.7% percentile inside its fair comparison set28%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #64 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 41.3%
- Percentile
- 41.7%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
41.7% percentile inside its fair comparison set41.3%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #67 · Source label: grok-4.3
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 98%
- Percentile
- 74.1%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
74.1% percentile inside its fair comparison set98%Raw benchmark value