APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #9 · Source label: Gemini 3 Flash Preview (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 27.7%
- Percentile
- 66.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
66.7% percentile inside its fair comparison set27.7%Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #18 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 50
- Percentile
- 34.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Google.
34.6% percentile inside its fair comparison set50Raw benchmark valueCI 47 - 53
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #4 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 86.9%
- Percentile
- 96.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Google.
96.7% percentile inside its fair comparison set86.9%Raw benchmark valueCI 86.2% - 87.6%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #14 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 42.6%
- Percentile
- 48%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Google.
48% percentile inside its fair comparison set42.6%Raw benchmark valueCI 41.7% - 43.4%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #28 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 73.9%
- Percentile
- 70.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Google.
70.3% percentile inside its fair comparison set73.9%Raw benchmark valueCI 72.2% - 75.6%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #3 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 55.9%
- Percentile
- 96.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.
96.1% percentile inside its fair comparison set55.9%Raw benchmark valueCI 51.8% - 60.1%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #45 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 69.9%
- Percentile
- 12%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Google.
12% percentile inside its fair comparison set69.9%Raw benchmark valueCI 66.2% - 73.6%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #19
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,497
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: expert. Source rank: #25. Votes: 1932. Organization: google. License: Proprietary.
93.5% percentile inside its fair comparison set1,497Raw benchmark valueCI 1,483 - 1,510
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 95%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_business_and_management_and_financial_operations. Source rank: #21. Votes: 5623. Organization: google. License: Proprietary.
95% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,462 - 1,479
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #11
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #13. Votes: 5585. Organization: google. License: Proprietary.
96.9% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,445 - 1,462
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #10
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 97%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_legal_and_government. Source rank: #12. Votes: 2310. Organization: google. License: Proprietary.
97% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,475 - 1,500
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #21
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,486
- Percentile
- 93.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_life_and_physical_and_social_science. Source rank: #25. Votes: 4888. Organization: google. License: Proprietary.
93.8% percentile inside its fair comparison set1,486Raw benchmark valueCI 1,477 - 1,495
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #24
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 92.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_mathematical. Source rank: #28. Votes: 1414. Organization: google. License: Proprietary.
92.5% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,456 - 1,488
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #23
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,484
- Percentile
- 92.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_medicine_and_healthcare. Source rank: #29. Votes: 2069. Organization: google. License: Proprietary.
92.5% percentile inside its fair comparison set1,484Raw benchmark valueCI 1,471 - 1,498
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #21
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,501
- Percentile
- 93.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_software_and_it_services. Source rank: #27. Votes: 10515. Organization: google. License: Proprietary.
93.8% percentile inside its fair comparison set1,501Raw benchmark valueCI 1,494 - 1,508
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #14
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_writing_and_literature_and_language. Source rank: #18. Votes: 6818. Organization: google. License: Proprietary.
96% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,453 - 1,469
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #28
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,463
- Percentile
- 90.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: expert. Source rank: #36. Votes: 1932. Organization: google. License: Proprietary.
90.2% percentile inside its fair comparison set1,463Raw benchmark valueCI 1,450 - 1,477
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #17
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 95%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_business_and_management_and_financial_operations. Source rank: #21. Votes: 5623. Organization: google. License: Proprietary.
95% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,442 - 1,458
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #9
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #11. Votes: 5585. Organization: google. License: Proprietary.
97.5% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,440 - 1,457
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #13
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,477
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_legal_and_government. Source rank: #15. Votes: 2310. Organization: google. License: Proprietary.
96% percentile inside its fair comparison set1,477Raw benchmark valueCI 1,465 - 1,490
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #18
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 94.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_life_and_physical_and_social_science. Source rank: #22. Votes: 4888. Organization: google. License: Proprietary.
94.7% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,464 - 1,481
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #22
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 93.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_mathematical. Source rank: #26. Votes: 1414. Organization: google. License: Proprietary.
93.2% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,450 - 1,481
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #22
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_medicine_and_healthcare. Source rank: #25. Votes: 2069. Organization: google. License: Proprietary.
92.9% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,452 - 1,478
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #24
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,471
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_software_and_it_services. Source rank: #30. Votes: 10515. Organization: google. License: Proprietary.
92.9% percentile inside its fair comparison set1,471Raw benchmark valueCI 1,465 - 1,478
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #13
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-3-flash`. Category: industry_writing_and_literature_and_language. Source rank: #16. Votes: 6818. Organization: google. License: Proprietary.
96.3% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,451 - 1,467
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #6 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 51.8%
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Google.
88.9% percentile inside its fair comparison set51.8%Raw benchmark valueCI 45.2% - 58.5%
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #6 · Source label: google/gemini-3-flash-preview
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,100.2 score
- Percentile
- 68.8%
- Last updated
- archived
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
68.8% percentile inside its fair comparison set1,100.2 scoreRaw benchmark valueCI 1,100.2 score - 1,100.2 score