APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #18 · Source label: Grok 4.20 0309 (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 14.2%
- Percentile
- 29.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
29.2% percentile inside its fair comparison set14.2%Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #25 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 39
- Percentile
- 7.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: xAI.
7.7% percentile inside its fair comparison set39Raw benchmark valueCI 38 - 41
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #66 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 77.7%
- Percentile
- 27.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: xAI.
27.8% percentile inside its fair comparison set77.7%Raw benchmark valueCI 76.8% - 78.7%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #25 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 28.5%
- Percentile
- 4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: xAI.
4% percentile inside its fair comparison set28.5%Raw benchmark valueCI 27.9% - 29.1%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #24 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 74.1%
- Percentile
- 74.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: xAI.
74.7% percentile inside its fair comparison set74.1%Raw benchmark valueCI 72.4% - 75.8%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #48 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 32.2%
- Percentile
- 7.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.
7.8% percentile inside its fair comparison set32.2%Raw benchmark valueCI 28% - 36.3%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #50 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 63.4%
- Percentile
- 2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: xAI.
2% percentile inside its fair comparison set63.4%Raw benchmark valueCI 59.3% - 67.5%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #27 · Source label: grok-4.20-multi-agent-beta-0309
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,484
- Percentile
- 90.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-multi-agent-beta-0309`. Category: expert. Source rank: #34. Votes: 3775. Organization: xai. License: Proprietary.
90.5% percentile inside its fair comparison set1,484Raw benchmark valueCI 1,474 - 1,495
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #16 · Source label: grok-4.20-beta1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 95.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta1`. Category: industry_business_and_management_and_financial_operations. Source rank: #20. Votes: 5150. Organization: xai. License: Proprietary.
95.3% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,461 - 1,479
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #17 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 95%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_entertainment_and_sports_and_media. Source rank: #20. Votes: 8927. Organization: xai. License: Proprietary.
95% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,439 - 1,454
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #16 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 95%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_legal_and_government. Source rank: #22. Votes: 3282. Organization: xai. License: Proprietary.
95% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,467 - 1,489
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #24 · Source label: grok-4.20-multi-agent-beta-0309
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,483
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-multi-agent-beta-0309`. Category: industry_life_and_physical_and_social_science. Source rank: #29. Votes: 6746. Organization: xai. License: Proprietary.
92.9% percentile inside its fair comparison set1,483Raw benchmark valueCI 1,475 - 1,491
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #28 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,464
- Percentile
- 91.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_mathematical. Source rank: #34. Votes: 2321. Organization: xai. License: Proprietary.
91.2% percentile inside its fair comparison set1,464Raw benchmark valueCI 1,450 - 1,477
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #12 · Source label: grok-4.20-beta1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,495
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta1`. Category: industry_medicine_and_healthcare. Source rank: #14. Votes: 1953. Organization: xai. License: Proprietary.
96.3% percentile inside its fair comparison set1,495Raw benchmark valueCI 1,481 - 1,509
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #18 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,506
- Percentile
- 94.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_software_and_it_services. Source rank: #22. Votes: 16466. Organization: xai. License: Proprietary.
94.8% percentile inside its fair comparison set1,506Raw benchmark valueCI 1,500 - 1,512
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #24 · Source label: grok-4.20-beta1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,451
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta1`. Category: industry_writing_and_literature_and_language. Source rank: #30. Votes: 6118. Organization: xai. License: Proprietary.
92.9% percentile inside its fair comparison set1,451Raw benchmark valueCI 1,443 - 1,459
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #39 · Source label: grok-4.20-multi-agent-beta-0309
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 86.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-multi-agent-beta-0309`. Category: expert. Source rank: #47. Votes: 3775. Organization: xai. License: Proprietary.
86.2% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,439 - 1,460
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #33 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,434
- Percentile
- 89.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_business_and_management_and_financial_operations. Source rank: #38. Votes: 8332. Organization: xai. License: Proprietary.
89.9% percentile inside its fair comparison set1,434Raw benchmark valueCI 1,426 - 1,441
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #21 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 93.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_entertainment_and_sports_and_media. Source rank: #25. Votes: 8927. Organization: xai. License: Proprietary.
93.8% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,420 - 1,435
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #29 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 90.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_legal_and_government. Source rank: #35. Votes: 3282. Organization: xai. License: Proprietary.
90.6% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,443 - 1,465
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #31 · Source label: grok-4.20-multi-agent-beta-0309
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 90.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-multi-agent-beta-0309`. Category: industry_life_and_physical_and_social_science. Source rank: #37. Votes: 6746. Organization: xai. License: Proprietary.
90.7% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,449 - 1,465
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #34 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 89.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_mathematical. Source rank: #41. Votes: 2321. Organization: xai. License: Proprietary.
89.3% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,436 - 1,462
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #35 · Source label: grok-4.20-multi-agent-beta-0309
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 88.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-multi-agent-beta-0309`. Category: industry_medicine_and_healthcare. Source rank: #38. Votes: 2979. Organization: xai. License: Proprietary.
88.5% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,445 - 1,469
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #31 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,464
- Percentile
- 90.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_software_and_it_services. Source rank: #38. Votes: 16466. Organization: xai. License: Proprietary.
90.8% percentile inside its fair comparison set1,464Raw benchmark valueCI 1,458 - 1,470
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #25 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 92.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `grok-4.20-beta-0309-reasoning`. Category: industry_writing_and_literature_and_language. Source rank: #32. Votes: 10281. Organization: xai. License: Proprietary.
92.6% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,429 - 1,443
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #31 · Source label: grok/grok-4.20-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 38.2%
- Percentile
- 33.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: xAI.
33.3% percentile inside its fair comparison set38.2%Raw benchmark valueCI 31.5% - 45%
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #42 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 62.9%
- Percentile
- 62%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
62% percentile inside its fair comparison set62.9%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #44 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 68%
- Percentile
- 60.2%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
60.2% percentile inside its fair comparison set68%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #40 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 50.8%
- Percentile
- 63.9%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
63.9% percentile inside its fair comparison set50.8%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #78 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 37.8%
- Percentile
- 28.7%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
28.7% percentile inside its fair comparison set37.8%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #25 · Source label: grok-4.20-beta-0309-reasoning
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 100%
- Percentile
- 100%
- Last updated
- archived
- Eligibility
- headline eligible
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
100% percentile inside its fair comparison set100%Raw benchmark value