GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #41 · Source label: Gemini 3.1 Flash-Lite
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 603
- Percentile
- 13%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
13% percentile inside its fair comparison set603Raw benchmark value
APEX-Agents-AA
AA · Professional reasoning · Objective
Long-horizon agentic task completion.
Rank #19 · Source label: Gemini 3.1 Flash-Lite
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 12.2%
- Percentile
- 25%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `apexAgents`.
25% percentile inside its fair comparison set12.2%Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #27 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 36
- Percentile
- 0%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Google.
0% percentile inside its fair comparison set36Raw benchmark valueCI 35 - 38
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #27 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 83.8%
- Percentile
- 71.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Google.
71.1% percentile inside its fair comparison set83.8%Raw benchmark valueCI 83% - 84.6%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #24 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 30%
- Percentile
- 8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Google.
8% percentile inside its fair comparison set30%Raw benchmark valueCI 28.7% - 31.2%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #51 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 71.8%
- Percentile
- 45.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Google.
45.1% percentile inside its fair comparison set71.8%Raw benchmark valueCI 70.1% - 73.5%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #16 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 47.6%
- Percentile
- 70.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.
70.6% percentile inside its fair comparison set47.6%Raw benchmark valueCI 43.5% - 51.7%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #49 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 63.9%
- Percentile
- 4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Google.
4% percentile inside its fair comparison set63.9%Raw benchmark valueCI 60.3% - 67.5%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #66 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 76.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: expert. Source rank: #84. Votes: 4298. Organization: google. License: Proprietary.
76.4% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,436 - 1,455
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #67 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 79.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_business_and_management_and_financial_operations. Source rank: #84. Votes: 9392. Organization: google. License: Proprietary.
79.2% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,413 - 1,426
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #55 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,403
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_entertainment_and_sports_and_media. Source rank: #68. Votes: 9892. Organization: google. License: Proprietary.
83.3% percentile inside its fair comparison set1,403Raw benchmark valueCI 1,396 - 1,410
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #58 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 80.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_legal_and_government. Source rank: #76. Votes: 3735. Organization: google. License: Proprietary.
80.9% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,428 - 1,449
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #55 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,452
- Percentile
- 83.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_life_and_physical_and_social_science. Source rank: #68. Votes: 7903. Organization: google. License: Proprietary.
83.3% percentile inside its fair comparison set1,452Raw benchmark valueCI 1,444 - 1,459
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #58 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,439
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_mathematical. Source rank: #71. Votes: 2673. Organization: google. License: Proprietary.
81.5% percentile inside its fair comparison set1,439Raw benchmark valueCI 1,427 - 1,451
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #54 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 82%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_medicine_and_healthcare. Source rank: #70. Votes: 3518. Organization: google. License: Proprietary.
82% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,445 - 1,466
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #71 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 78.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_software_and_it_services. Source rank: #88. Votes: 19057. Organization: google. License: Proprietary.
78.5% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,449 - 1,460
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #47 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_writing_and_literature_and_language. Source rank: #60. Votes: 11366. Organization: google. License: Proprietary.
85.8% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,413 - 1,426
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #90 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,401
- Percentile
- 67.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: expert. Source rank: #110. Votes: 4298. Organization: google. License: Proprietary.
67.6% percentile inside its fair comparison set1,401Raw benchmark valueCI 1,392 - 1,411
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #97 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,389
- Percentile
- 69.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_business_and_management_and_financial_operations. Source rank: #115. Votes: 9392. Organization: google. License: Proprietary.
69.8% percentile inside its fair comparison set1,389Raw benchmark valueCI 1,382 - 1,396
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #70 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,385
- Percentile
- 78.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_entertainment_and_sports_and_media. Source rank: #84. Votes: 9892. Organization: google. License: Proprietary.
78.6% percentile inside its fair comparison set1,385Raw benchmark valueCI 1,378 - 1,392
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #80 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 73.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_legal_and_government. Source rank: #98. Votes: 3735. Organization: google. License: Proprietary.
73.5% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,406 - 1,426
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #78 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 76.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_life_and_physical_and_social_science. Source rank: #92. Votes: 7903. Organization: google. License: Proprietary.
76.2% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,420 - 1,435
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #70 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 77.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_mathematical. Source rank: #83. Votes: 2673. Organization: google. License: Proprietary.
77.6% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,411 - 1,436
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #71 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 76.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_medicine_and_healthcare. Source rank: #83. Votes: 3518. Organization: google. License: Proprietary.
76.3% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,418 - 1,439
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #98 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,413
- Percentile
- 70.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_software_and_it_services. Source rank: #116. Votes: 19057. Organization: google. License: Proprietary.
70.2% percentile inside its fair comparison set1,413Raw benchmark valueCI 1,408 - 1,419
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #58 · Source label: gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 82.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `gemini-3.1-flash-lite-preview`. Category: industry_writing_and_literature_and_language. Source rank: #70. Votes: 11366. Organization: google. License: Proprietary.
82.4% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,400 - 1,413
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #14 · Source label: google/gemini-3.1-flash-lite-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 49.5%
- Percentile
- 71.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Google.
71.1% percentile inside its fair comparison set49.5%Raw benchmark valueCI 42.7% - 56.4%