LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #3 · Source label: google/gemini-3-pro-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 87%
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Google.
97.8% percentile inside its fair comparison set87%Raw benchmark valueCI 86.3% - 87.7%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #42 · Source label: google/gemini-3-pro-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 72.6%
- Percentile
- 54.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Google.
54.9% percentile inside its fair comparison set72.6%Raw benchmark valueCI 70.9% - 74.3%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #8 · Source label: google/gemini-3-pro-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 52.2%
- Percentile
- 86.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.
86.3% percentile inside its fair comparison set52.2%Raw benchmark valueCI 48.1% - 56.3%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #43 · Source label: google/gemini-3-pro-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 72%
- Percentile
- 16%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Google.
16% percentile inside its fair comparison set72%Raw benchmark valueCI 68.3% - 75.8%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #14 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,503
- Percentile
- 95.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: expert. Source rank: #19. Votes: 2532. Organization: google. License: Proprietary.
95.3% percentile inside its fair comparison set1,503Raw benchmark valueCI 1,491 - 1,515
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #12 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,475
- Percentile
- 96.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_business_and_management_and_financial_operations. Source rank: #16. Votes: 7578. Organization: google. License: Proprietary.
96.5% percentile inside its fair comparison set1,475Raw benchmark valueCI 1,468 - 1,483
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #4 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,468
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_entertainment_and_sports_and_media. Source rank: #6. Votes: 7675. Organization: google. License: Proprietary.
99.1% percentile inside its fair comparison set1,468Raw benchmark valueCI 1,461 - 1,476
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #6 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,502
- Percentile
- 98.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_legal_and_government. Source rank: #7. Votes: 2904. Organization: google. License: Proprietary.
98.3% percentile inside its fair comparison set1,502Raw benchmark valueCI 1,491 - 1,513
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #8 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,499
- Percentile
- 97.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_life_and_physical_and_social_science. Source rank: #10. Votes: 6719. Organization: google. License: Proprietary.
97.8% percentile inside its fair comparison set1,499Raw benchmark valueCI 1,491 - 1,506
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #16 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 95.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_mathematical. Source rank: #19. Votes: 1941. Organization: google. License: Proprietary.
95.1% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,468 - 1,495
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #4 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,508
- Percentile
- 99%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_medicine_and_healthcare. Source rank: #6. Votes: 2532. Organization: google. License: Proprietary.
99% percentile inside its fair comparison set1,508Raw benchmark valueCI 1,496 - 1,521
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #12 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,511
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_software_and_it_services. Source rank: #15. Votes: 14254. Organization: google. License: Proprietary.
96.6% percentile inside its fair comparison set1,511Raw benchmark valueCI 1,505 - 1,517
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #4 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,481
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_writing_and_literature_and_language. Source rank: #6. Votes: 9248. Organization: google. License: Proprietary.
99.1% percentile inside its fair comparison set1,481Raw benchmark valueCI 1,474 - 1,488
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #18 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,480
- Percentile
- 93.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: expert. Source rank: #24. Votes: 2532. Organization: google. License: Proprietary.
93.8% percentile inside its fair comparison set1,480Raw benchmark valueCI 1,468 - 1,492
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #13 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 96.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_business_and_management_and_financial_operations. Source rank: #16. Votes: 7578. Organization: google. License: Proprietary.
96.2% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,453 - 1,468
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #5 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,464
- Percentile
- 98.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_entertainment_and_sports_and_media. Source rank: #7. Votes: 7675. Organization: google. License: Proprietary.
98.8% percentile inside its fair comparison set1,464Raw benchmark valueCI 1,456 - 1,471
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #4 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,493
- Percentile
- 99%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_legal_and_government. Source rank: #5. Votes: 2904. Organization: google. License: Proprietary.
99% percentile inside its fair comparison set1,493Raw benchmark valueCI 1,482 - 1,504
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #9 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,486
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_life_and_physical_and_social_science. Source rank: #11. Votes: 6719. Organization: google. License: Proprietary.
97.5% percentile inside its fair comparison set1,486Raw benchmark valueCI 1,478 - 1,494
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #16 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,475
- Percentile
- 95.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_mathematical. Source rank: #19. Votes: 1941. Organization: google. License: Proprietary.
95.1% percentile inside its fair comparison set1,475Raw benchmark valueCI 1,462 - 1,488
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #4 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,492
- Percentile
- 99%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_medicine_and_healthcare. Source rank: #5. Votes: 2532. Organization: google. License: Proprietary.
99% percentile inside its fair comparison set1,492Raw benchmark valueCI 1,480 - 1,504
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #15 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_software_and_it_services. Source rank: #18. Votes: 14254. Organization: google. License: Proprietary.
95.7% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,481 - 1,493
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #5 · Source label: gemini-3-pro
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 98.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3-pro`. Category: industry_writing_and_literature_and_language. Source rank: #7. Votes: 9248. Organization: google. License: Proprietary.
98.8% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,471 - 1,485
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #17 · Source label: google/gemini-3-pro-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 47.6%
- Percentile
- 64.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Google.
64.4% percentile inside its fair comparison set47.6%Raw benchmark valueCI 41% - 54.2%
PRBench Legal
SL · Professional reasoning · Rubric
Applied legal reasoning on professional-domain tasks.
Rank #11
verified runtimeexact directBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Scale Labs
- Raw value
- 40.6%
- Percentile
- 16.7%
- Last updated
- recent
- Eligibility
- preview_model
16.7% percentile inside its fair comparison set40.6%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #8 · Source label: google/gemini-3-pro-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,078.9 score
- Percentile
- 56.3%
- Last updated
- archived
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
56.3% percentile inside its fair comparison set1,078.9 scoreRaw benchmark valueCI 1,078.9 score - 1,078.9 score