LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #18 · Source label: google/gemini-2.5-pro-exp-03-25
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 84.3%
- Percentile
- 81.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Google.
81.1% percentile inside its fair comparison set84.3%Raw benchmark valueCI 83.5% - 85.1%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #38 · Source label: google/gemini-2.5-pro-exp-03-25
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 72.9%
- Percentile
- 59.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Google.
59.3% percentile inside its fair comparison set72.9%Raw benchmark valueCI 71.2% - 74.6%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #10 · Source label: google/gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 50.6%
- Percentile
- 82.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.
82.4% percentile inside its fair comparison set50.6%Raw benchmark valueCI 46.4% - 54.7%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #37 · Source label: google/gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 73.6%
- Percentile
- 28%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Google.
28% percentile inside its fair comparison set73.6%Raw benchmark valueCI 69.8% - 77.3%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #50 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: expert. Source rank: #64. Votes: 7689. Organization: google. License: Proprietary.
82.2% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,454 - 1,469
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #51 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_business_and_management_and_financial_operations. Source rank: #67. Votes: 22479. Organization: google. License: Proprietary.
84.3% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,431 - 1,440
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #30 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 91%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_entertainment_and_sports_and_media. Source rank: #41. Votes: 22939. Organization: google. License: Proprietary.
91% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,426 - 1,435
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #25 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,468
- Percentile
- 91.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_legal_and_government. Source rank: #34. Votes: 8797. Organization: google. License: Proprietary.
91.9% percentile inside its fair comparison set1,468Raw benchmark valueCI 1,461 - 1,475
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #37 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_life_and_physical_and_social_science. Source rank: #47. Votes: 20439. Organization: google. License: Proprietary.
88.9% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,465 - 1,475
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #42 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 86.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_mathematical. Source rank: #53. Votes: 6619. Organization: google. License: Proprietary.
86.7% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,441 - 1,457
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #44 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 85.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_medicine_and_healthcare. Source rank: #55. Votes: 7809. Organization: google. License: Proprietary.
85.4% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,460 - 1,475
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #63 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 80.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_software_and_it_services. Source rank: #79. Votes: 43931. Organization: google. License: Proprietary.
80.9% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,456 - 1,463
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #27 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 92%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_writing_and_literature_and_language. Source rank: #37. Votes: 27763. Organization: google. License: Proprietary.
92% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,441 - 1,450
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #33 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 88.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: expert. Source rank: #41. Votes: 7689. Organization: google. License: Proprietary.
88.4% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,447 - 1,462
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #25 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,439
- Percentile
- 92.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_business_and_management_and_financial_operations. Source rank: #30. Votes: 22479. Organization: google. License: Proprietary.
92.5% percentile inside its fair comparison set1,439Raw benchmark valueCI 1,435 - 1,444
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #16 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,441
- Percentile
- 95.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_entertainment_and_sports_and_media. Source rank: #20. Votes: 22939. Organization: google. License: Proprietary.
95.4% percentile inside its fair comparison set1,441Raw benchmark valueCI 1,436 - 1,445
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #12 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,479
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_legal_and_government. Source rank: #14. Votes: 8797. Organization: google. License: Proprietary.
96.3% percentile inside its fair comparison set1,479Raw benchmark valueCI 1,472 - 1,485
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #13 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,479
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_life_and_physical_and_social_science. Source rank: #17. Votes: 20439. Organization: google. License: Proprietary.
96.3% percentile inside its fair comparison set1,479Raw benchmark valueCI 1,474 - 1,484
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #29 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 90.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_mathematical. Source rank: #36. Votes: 6619. Organization: google. License: Proprietary.
90.9% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,446 - 1,462
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #10 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,475
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_medicine_and_healthcare. Source rank: #12. Votes: 7809. Organization: google. License: Proprietary.
96.9% percentile inside its fair comparison set1,475Raw benchmark valueCI 1,468 - 1,483
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #37 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_software_and_it_services. Source rank: #46. Votes: 43931. Organization: google. License: Proprietary.
88.9% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,456 - 1,463
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #14 · Source label: gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-pro`. Category: industry_writing_and_literature_and_language. Source rank: #18. Votes: 27763. Organization: google. License: Proprietary.
96% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,449 - 1,458
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #27 · Source label: google/gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 41.9%
- Percentile
- 42.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Google.
42.2% percentile inside its fair comparison set41.9%Raw benchmark valueCI 35.2% - 48.6%
PRBench Legal
SL · Professional reasoning · Rubric
Applied legal reasoning on professional-domain tasks.
Rank #9
verified runtimeexact direct
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Scale Labs
- Raw value
- 41.4%
- Percentile
- 33.3%
- Last updated
- recent
- Eligibility
- headline eligible
33.3% percentile inside its fair comparison set41.4%Raw benchmark value
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #13 · Source label: google/gemini-2.5-pro
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 1,032.6 score
- Percentile
- 25%
- Last updated
- archived
- Eligibility
- headline eligible
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
25% percentile inside its fair comparison set1,032.6 scoreRaw benchmark valueCI 1,032.6 score - 1,032.6 score