Model profile · Google

Gemini 2.5 Flash

Closed weightspremium · registry tag 2026 fast

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 24%
Verified coverage: 24%
Spread: 84.3%
Last verified: Jun 20, 2026

41%bench fit

textcodevisiondocumentaudio10 aliases41 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text29 benchmarks63.5%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #172 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 14
Percentile: 56.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

56.7% percentile inside its fair comparison set

14Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #55 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25.1%
Percentile: 81.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

81.9% percentile inside its fair comparison set

25.1%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #256 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.7%
Percentile: 14.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

14.4% percentile inside its fair comparison set

6.7%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #180 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 39%
Percentile: 43.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

43.2% percentile inside its fair comparison set

39%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #164 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.9 /1M tokens
Percentile: 44.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

44.2% percentile inside its fair comparison set

$0.9 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #133 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.3 /1M input tokens
Percentile: 56.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

56.5% percentile inside its fair comparison set

$0.3 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #177 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $2.5 /1M output tokens
Percentile: 39.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

39.9% percentile inside its fair comparison set

$2.5 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #20 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 228.7 tokens/s
Percentile: 91%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

91% percentile inside its fair comparison set

228.7 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #194 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 23.26s
Percentile: 8.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

8.1% percentile inside its fair comparison set

23.26sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #139 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 23.26s
Percentile: 34.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

34.3% percentile inside its fair comparison set

23.26sRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #161 · Source label: Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 11
Percentile: 15.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

15.6% percentile inside its fair comparison set

11Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 73.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: overall. Source rank: #108. Votes: 124544. Organization: google. License: Proprietary.

73.2% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,408 - 1,413

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #62 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 81.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: creative_writing. Source rank: #76. Votes: 17364. Organization: google. License: Proprietary.

81.1% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,392 - 1,402

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #98 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,412
Percentile: 70.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: english. Source rank: #119. Votes: 59489. Organization: google. License: Proprietary.

70.2% percentile inside its fair comparison set

1,412Raw benchmark valueCI 1,409 - 1,415

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 73.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: exclude_ties. Source rank: #108. Votes: 89454. Organization: google. License: Proprietary.

73.2% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,392 - 1,399

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 70.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: hard_prompts. Source rank: #117. Votes: 17499. Organization: google. License: Proprietary.

70.5% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,415 - 1,425

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,423
Percentile: 67.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: hard_prompts_english. Source rank: #130. Votes: 8802. Organization: google. License: Proprietary.

67.3% percentile inside its fair comparison set

1,423Raw benchmark valueCI 1,416 - 1,430

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #84 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 74.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: instruction_following. Source rank: #103. Votes: 9152. Organization: google. License: Proprietary.

74.5% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,395 - 1,408

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #83 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 73%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: longer_query. Source rank: #103. Votes: 31943. Organization: google. License: Proprietary.

73% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,415 - 1,423

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 70.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: multi_turn. Source rank: #118. Votes: 21617. Organization: google. License: Proprietary.

70.3% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,399 - 1,408

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #75 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 77.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: overall. Source rank: #89. Votes: 124544. Organization: google. License: Proprietary.

77.2% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,415 - 1,420

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #50 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 84.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: creative_writing. Source rank: #63. Votes: 17364. Organization: google. License: Proprietary.

84.8% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,397 - 1,407

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #84 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,418
Percentile: 74.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: english. Source rank: #100. Votes: 59489. Organization: google. License: Proprietary.

74.5% percentile inside its fair comparison set

1,418Raw benchmark valueCI 1,415 - 1,421

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #73 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,405
Percentile: 77.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: exclude_ties. Source rank: #86. Votes: 89454. Organization: google. License: Proprietary.

77.8% percentile inside its fair comparison set

1,405Raw benchmark valueCI 1,401 - 1,408

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #71 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,422
Percentile: 78.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: hard_prompts. Source rank: #88. Votes: 63888. Organization: google. License: Proprietary.

78.5% percentile inside its fair comparison set

1,422Raw benchmark valueCI 1,419 - 1,426

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #85 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 74.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: hard_prompts_english. Source rank: #102. Votes: 32205. Organization: google. License: Proprietary.

74.1% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,416 - 1,424

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #61 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,406
Percentile: 81.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: instruction_following. Source rank: #75. Votes: 33961. Organization: google. License: Proprietary.

81.5% percentile inside its fair comparison set

1,406Raw benchmark valueCI 1,402 - 1,409

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #61 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 80.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: longer_query. Source rank: #74. Votes: 31943. Organization: google. License: Proprietary.

80.3% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,416 - 1,424

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #82 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,408
Percentile: 74.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: multi_turn. Source rank: #99. Votes: 21617. Organization: google. License: Proprietary.

74.9% percentile inside its fair comparison set

1,408Raw benchmark valueCI 1,403 - 1,413

Coding7 benchmarks44%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #137 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 12.1%
Percentile: 55.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

55.3% percentile inside its fair comparison set

12.1%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #184 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 29.1%
Percentile: 50.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

50.5% percentile inside its fair comparison set

29.1%Raw benchmark value

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #51 · Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 76.2%
Percentile: 44.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: Google.

44.4% percentile inside its fair comparison set

76.2%Raw benchmark valueCI 74% - 78.4%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #119 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 63.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: coding. Source rank: #144. Votes: 6843. Organization: google. License: Proprietary.

63.1% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,421 - 1,436

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #84 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 74.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: coding. Source rank: #101. Votes: 25914. Organization: google. License: Proprietary.

74.1% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,419 - 1,428

IOI

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #39 · Source label: google/gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 2.6%
Percentile: 13.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

13.6% percentile inside its fair comparison set

2.6%Raw benchmark valueCI 0.6% - 4.6%

Terminal-Bench 2.0

TERMINAL-BENCH · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #29 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Terminal-Bench
Raw value: 17.1%
Percentile: 6.7%
Last updated: archived
Eligibility: headline eligible

Parsed from the public Terminal-Bench 2.0 verified leaderboard. Collapse policy: highest verified score per canonical model. Selected agent: Mini-SWE-Agent (unknown). Display model: Gemini 2.5 Flash. Integration method: API. Agent URL: https://github.com/SWE-agent/mini-swe-agent. Reported stderr: 1.271 percentage points.

6.7% percentile inside its fair comparison set

17.1%Raw benchmark value

Reasoning / math / science7 benchmarks60.9%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #214 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 5.1%
Percentile: 42.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

42.4% percentile inside its fair comparison set

5.1%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #151 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 68.3%
Percentile: 59.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

59.9% percentile inside its fair comparison set

68.3%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #121 · Source label: Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0%
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

65.2% percentile inside its fair comparison set

0%Raw benchmark value

GPQA Diamond

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #41 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 81.6%
Percentile: 55.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: gpqa; provider: Google.

55.1% percentile inside its fair comparison set

81.6%Raw benchmark valueCI 77.7% - 85.5%

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #44 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.7%
Percentile: 51.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

51.7% percentile inside its fair comparison set

83.7%Raw benchmark valueCI 83% - 84.4%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #79 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,412
Percentile: 75.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: math. Source rank: #98. Votes: 1944. Organization: google. License: Proprietary.

75.2% percentile inside its fair comparison set

1,412Raw benchmark valueCI 1,399 - 1,425

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #74 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 76.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: math. Source rank: #90. Votes: 1944. Organization: google. License: Proprietary.

76.8% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,403 - 1,429

Professional reasoning24 benchmarks70.9%

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #26 · Source label: google/gemini-2.5-flash-preview-04-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.8%
Percentile: 72.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Google.

72.2% percentile inside its fair comparison set

83.8%Raw benchmark valueCI 83% - 84.6%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #40 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 72.7%
Percentile: 57.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Google.

57.1% percentile inside its fair comparison set

72.7%Raw benchmark valueCI 71% - 74.4%

MedCode

VALS-AI · Professional reasoning · Objective

Medical billing support and coding tasks.

Rank #27 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 40.5%
Percentile: 49%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

49% percentile inside its fair comparison set

40.5%Raw benchmark valueCI 36.8% - 44.3%

MedScribe

VALS-AI · Professional reasoning · Objective

Administrative documentation support for doctors.

Rank #15 · Source label: google/gemini-2.5-flash-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83%
Percentile: 72%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Google.

72% percentile inside its fair comparison set

83%Raw benchmark valueCI 79.2% - 86.7%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #73 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,440
Percentile: 73.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: expert. Source rank: #91. Votes: 1629. Organization: google. License: Proprietary.

73.8% percentile inside its fair comparison set

1,440Raw benchmark valueCI 1,425 - 1,454

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #93 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 71.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_business_and_management_and_financial_operations. Source rank: #113. Votes: 6094. Organization: google. License: Proprietary.

71.1% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,395 - 1,411

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #74 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 77.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #93. Votes: 22885. Organization: google. License: Proprietary.

77.4% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,383 - 1,392

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #70 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,428
Percentile: 76.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_legal_and_government. Source rank: #89. Votes: 2161. Organization: google. License: Proprietary.

76.8% percentile inside its fair comparison set

1,428Raw benchmark valueCI 1,415 - 1,441

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #73 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 77.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_life_and_physical_and_social_science. Source rank: #91. Votes: 5149. Organization: google. License: Proprietary.

77.7% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,428 - 1,445

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #79 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 74.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_mathematical. Source rank: #96. Votes: 1630. Organization: google. License: Proprietary.

74.7% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,406 - 1,435

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #96 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,427
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_medicine_and_healthcare. Source rank: #116. Votes: 1642. Organization: google. License: Proprietary.

67.8% percentile inside its fair comparison set

1,427Raw benchmark valueCI 1,412 - 1,442

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #112 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,423
Percentile: 65.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_software_and_it_services. Source rank: #135. Votes: 11752. Organization: google. License: Proprietary.

65.8% percentile inside its fair comparison set

1,423Raw benchmark valueCI 1,417 - 1,429

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #64 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,404
Percentile: 80.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_writing_and_literature_and_language. Source rank: #81. Votes: 28038. Organization: google. License: Proprietary.

80.6% percentile inside its fair comparison set

1,404Raw benchmark valueCI 1,400 - 1,408

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #67 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,425
Percentile: 76%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: expert. Source rank: #81. Votes: 7786. Organization: google. License: Proprietary.

76% percentile inside its fair comparison set

1,425Raw benchmark valueCI 1,418 - 1,432

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #78 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 75.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_business_and_management_and_financial_operations. Source rank: #94. Votes: 22352. Organization: google. License: Proprietary.

75.8% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,399 - 1,408

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #59 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 82%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #72. Votes: 22885. Organization: google. License: Proprietary.

82% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,390 - 1,399

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #57 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,433
Percentile: 81.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_legal_and_government. Source rank: #70. Votes: 8733. Organization: google. License: Proprietary.

81.2% percentile inside its fair comparison set

1,433Raw benchmark valueCI 1,426 - 1,440

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #63 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 80.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_life_and_physical_and_social_science. Source rank: #76. Votes: 5149. Organization: google. License: Proprietary.

80.8% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,427 - 1,444

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #73 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,422
Percentile: 76.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_mathematical. Source rank: #86. Votes: 6798. Organization: google. License: Proprietary.

76.6% percentile inside its fair comparison set

1,422Raw benchmark valueCI 1,414 - 1,429

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #70 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 76.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_medicine_and_healthcare. Source rank: #82. Votes: 7717. Organization: google. License: Proprietary.

76.6% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,422 - 1,436

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #87 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,426
Percentile: 73.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_software_and_it_services. Source rank: #104. Votes: 43644. Organization: google. License: Proprietary.

73.5% percentile inside its fair comparison set

1,426Raw benchmark valueCI 1,422 - 1,429

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #55 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,407
Percentile: 83.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_writing_and_literature_and_language. Source rank: #67. Votes: 28038. Organization: google. License: Proprietary.

83.3% percentile inside its fair comparison set

1,407Raw benchmark valueCI 1,403 - 1,411

SAGE

VALS-AI · Professional reasoning · Objective

Student Assessment with Generative Evaluation.

Rank #21 · Source label: google/gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 44.8%
Percentile: 55.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Google.

55.6% percentile inside its fair comparison set

44.8%Raw benchmark valueCI 38.1% - 51.5%

PRBench Legal

SL · Professional reasoning · Rubric

Applied legal reasoning on professional-domain tasks.

Rank #10 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 41%
Percentile: 25%
Last updated: recent
Eligibility: headline eligible

25% percentile inside its fair comparison set

41%Raw benchmark value

Search / tool use1 benchmark15.2%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #263 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 14.9%
Percentile: 15.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

15.2% percentile inside its fair comparison set

14.9%Raw benchmark value

Long context2 benchmarks55%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #102 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 45.9%
Percentile: 67.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

67.9% percentile inside its fair comparison set

45.9%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #52 · Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 59.8%
Percentile: 42%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: Google.

42% percentile inside its fair comparison set

59.8%Raw benchmark valueCI 57.9% - 61.6%

Vision understanding22 benchmarks64.8%

MMMU-Pro

AA · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #48 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 65.5%
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `mmmuPro`.

65.2% percentile inside its fair comparison set

65.5%Raw benchmark value

Vision Arena

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #33 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,226
Percentile: 70.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: overall. Source rank: #43. Votes: 4726. Organization: google. License: Proprietary.

70.6% percentile inside its fair comparison set

1,226Raw benchmark valueCI 1,216 - 1,236

Vision Arena · Captioning

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #8 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,214
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: captioning. Source rank: #8. Votes: 595. Organization: google. License: Proprietary.

73.1% percentile inside its fair comparison set

1,214Raw benchmark valueCI 1,189 - 1,239

Vision Arena · Creative Writing Vision

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #24 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,239
Percentile: 58.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing_vision. Source rank: #31. Votes: 294. Organization: google. License: Proprietary.

58.2% percentile inside its fair comparison set

1,239Raw benchmark valueCI 1,206 - 1,273

Vision Arena · Diagram

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #38 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,243
Percentile: 47.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: diagram. Source rank: #49. Votes: 972. Organization: google. License: Proprietary.

47.1% percentile inside its fair comparison set

1,243Raw benchmark valueCI 1,224 - 1,261

Vision Arena · English

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #30 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,231
Percentile: 73.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: english. Source rank: #39. Votes: 2057. Organization: google. License: Proprietary.

73.4% percentile inside its fair comparison set

1,231Raw benchmark valueCI 1,217 - 1,245

Vision Arena · Entity Recognition

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #17 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,218
Percentile: 50%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: entity_recognition. Source rank: #16. Votes: 590. Organization: google. License: Proprietary.

50% percentile inside its fair comparison set

1,218Raw benchmark valueCI 1,194 - 1,242

Vision Arena · Homework

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #28 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,266
Percentile: 60.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: homework. Source rank: #36. Votes: 624. Organization: google. License: Proprietary.

60.3% percentile inside its fair comparison set

1,266Raw benchmark valueCI 1,244 - 1,289

Vision Arena · Humor

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #28 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,227
Percentile: 44.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: humor. Source rank: #35. Votes: 216. Organization: google. License: Proprietary.

44.9% percentile inside its fair comparison set

1,227Raw benchmark valueCI 1,190 - 1,264

Vision Arena · Ocr

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #33 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,241
Percentile: 54.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: ocr. Source rank: #42. Votes: 2784. Organization: google. License: Proprietary.

54.3% percentile inside its fair comparison set

1,241Raw benchmark valueCI 1,229 - 1,252

Vision Arena · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #27 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,250
Percentile: 76.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: overall. Source rank: #34. Votes: 4726. Organization: google. License: Proprietary.

76.1% percentile inside its fair comparison set

1,250Raw benchmark valueCI 1,240 - 1,260

Vision Arena · Captioning · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #8 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,278
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: captioning. Source rank: #8. Votes: 595. Organization: google. License: Proprietary.

73.1% percentile inside its fair comparison set

1,278Raw benchmark valueCI 1,254 - 1,301

Vision Arena · Creative Writing Vision · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #18 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,265
Percentile: 69.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing_vision. Source rank: #23. Votes: 294. Organization: google. License: Proprietary.

69.1% percentile inside its fair comparison set

1,265Raw benchmark valueCI 1,232 - 1,298

Vision Arena · Diagram · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #35 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,251
Percentile: 51.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: diagram. Source rank: #43. Votes: 972. Organization: google. License: Proprietary.

51.4% percentile inside its fair comparison set

1,251Raw benchmark valueCI 1,233 - 1,270

Vision Arena · English · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #25 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,255
Percentile: 78%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: english. Source rank: #30. Votes: 2057. Organization: google. License: Proprietary.

78% percentile inside its fair comparison set

1,255Raw benchmark valueCI 1,242 - 1,269

Vision Arena · Entity Recognition · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #15 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,247
Percentile: 56.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: entity_recognition. Source rank: #14. Votes: 590. Organization: google. License: Proprietary.

56.3% percentile inside its fair comparison set

1,247Raw benchmark valueCI 1,223 - 1,270

Vision Arena · Homework · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #25 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,279
Percentile: 64.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: homework. Source rank: #31. Votes: 624. Organization: google. License: Proprietary.

64.7% percentile inside its fair comparison set

1,279Raw benchmark valueCI 1,256 - 1,301

Vision Arena · Humor · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #19 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,262
Percentile: 63.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: humor. Source rank: #24. Votes: 216. Organization: google. License: Proprietary.

63.3% percentile inside its fair comparison set

1,262Raw benchmark valueCI 1,226 - 1,299

Vision Arena · Ocr · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #26 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,257
Percentile: 64.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: ocr. Source rank: #32. Votes: 2784. Organization: google. License: Proprietary.

64.3% percentile inside its fair comparison set

1,257Raw benchmark valueCI 1,246 - 1,269

MMMU Pro

VALS-AI · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #26 · Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 80.8%
Percentile: 56.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

56.9% percentile inside its fair comparison set

80.8%Raw benchmark valueCI 78.9% - 82.6%

Vision Arena · Creative Writing

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,263
Percentile: 87.5%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing. Source rank: #6. Votes: 555. Organization: google. License: Proprietary.

87.5% percentile inside its fair comparison set

1,263Raw benchmark valueCI 1,238 - 1,289

Vision Arena · Creative Writing · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,289
Percentile: 87.5%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing. Source rank: #5. Votes: 555. Organization: google. License: Proprietary.

87.5% percentile inside its fair comparison set

1,289Raw benchmark valueCI 1,264 - 1,313

Document understanding1 benchmark48.3%

MortgageTax

VALS-AI · Document understanding · Objective

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #32 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 62.6%
Percentile: 48.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mortgage_tax; provider: Google.

48.3% percentile inside its fair comparison set

62.6%Raw benchmark valueCI 60.7% - 64.5%

Multilingual16 benchmarks74.6%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #78 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,449
Percentile: 73.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #95. Votes: 1413. Organization: google. License: Proprietary.

73.9% percentile inside its fair comparison set

1,449Raw benchmark valueCI 1,433 - 1,465

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #69 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,441
Percentile: 68.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: french. Source rank: #86. Votes: 521. Organization: google. License: Proprietary.

68.5% percentile inside its fair comparison set

1,441Raw benchmark valueCI 1,414 - 1,467

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #58 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 75.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: german. Source rank: #76. Votes: 602. Organization: google. License: Proprietary.

75.9% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,394 - 1,443

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #39 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 81.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: japanese. Source rank: #55. Votes: 303. Organization: google. License: Proprietary.

81.3% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,365 - 1,434

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #57 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,378
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: korean. Source rank: #72. Votes: 656. Organization: google. License: Proprietary.

73.1% percentile inside its fair comparison set

1,378Raw benchmark valueCI 1,355 - 1,402

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #66 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 77.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: russian. Source rank: #85. Votes: 10168. Organization: google. License: Proprietary.

77.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,407 - 1,419

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #73 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,405
Percentile: 66.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: spanish. Source rank: #91. Votes: 3013. Organization: google. License: Proprietary.

66.4% percentile inside its fair comparison set

1,405Raw benchmark valueCI 1,393 - 1,418

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #74 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,452
Percentile: 75.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #88. Votes: 1413. Organization: google. License: Proprietary.

75.3% percentile inside its fair comparison set

1,452Raw benchmark valueCI 1,437 - 1,468

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #56 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 74.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: french. Source rank: #68. Votes: 521. Organization: google. License: Proprietary.

74.5% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,412 - 1,465

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #46 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 81%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: german. Source rank: #57. Votes: 602. Organization: google. License: Proprietary.

81% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,400 - 1,448

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #25 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,407
Percentile: 88.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: japanese. Source rank: #34. Votes: 303. Organization: google. License: Proprietary.

88.2% percentile inside its fair comparison set

1,407Raw benchmark valueCI 1,372 - 1,442

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #44 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,392
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: korean. Source rank: #53. Votes: 656. Organization: google. License: Proprietary.

79.3% percentile inside its fair comparison set

1,392Raw benchmark valueCI 1,368 - 1,416

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #57 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 80.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: russian. Source rank: #71. Votes: 10168. Organization: google. License: Proprietary.

80.6% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,411 - 1,423

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #59 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 72.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: spanish. Source rank: #72. Votes: 3013. Organization: google. License: Proprietary.

72.9% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,407 - 1,432

Vision Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #33 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,259
Percentile: 58.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #42. Votes: 390. Organization: google. License: Proprietary.

58.4% percentile inside its fair comparison set

1,259Raw benchmark valueCI 1,229 - 1,289

Vision Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #26 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,292
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #33. Votes: 390. Organization: google. License: Proprietary.

67.5% percentile inside its fair comparison set

1,292Raw benchmark valueCI 1,262 - 1,322

Source links and registry checks

official

Google Gemini models docs

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

official

Terminal-Bench

Jun 20, 2026

source →

Model profile · Google

Gemini 2.5 Flash

Closed weightspremium · registry tag 2026 fast

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 24%
Verified coverage: 24%
Spread: 84.3%
Last verified: Jun 20, 2026

41%bench fit

textcodevisiondocumentaudio10 aliases41 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text29 benchmarks63.5%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #172 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 14
Percentile: 56.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

56.7% percentile inside its fair comparison set

14Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #55 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25.1%
Percentile: 81.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

81.9% percentile inside its fair comparison set

25.1%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #256 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.7%
Percentile: 14.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

14.4% percentile inside its fair comparison set

6.7%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #180 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 39%
Percentile: 43.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

43.2% percentile inside its fair comparison set

39%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #164 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.9 /1M tokens
Percentile: 44.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

44.2% percentile inside its fair comparison set

$0.9 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #133 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.3 /1M input tokens
Percentile: 56.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

56.5% percentile inside its fair comparison set

$0.3 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #177 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $2.5 /1M output tokens
Percentile: 39.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

39.9% percentile inside its fair comparison set

$2.5 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #20 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 228.7 tokens/s
Percentile: 91%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

91% percentile inside its fair comparison set

228.7 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #194 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 23.26s
Percentile: 8.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

8.1% percentile inside its fair comparison set

23.26sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #139 · Source label: Gemini 2.5 Flash (Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 23.26s
Percentile: 34.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

34.3% percentile inside its fair comparison set

23.26sRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #161 · Source label: Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 11
Percentile: 15.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

15.6% percentile inside its fair comparison set

11Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,410
Percentile: 73.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: overall. Source rank: #108. Votes: 124544. Organization: google. License: Proprietary.

73.2% percentile inside its fair comparison set

1,410Raw benchmark valueCI 1,408 - 1,413

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #62 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 81.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: creative_writing. Source rank: #76. Votes: 17364. Organization: google. License: Proprietary.

81.1% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,392 - 1,402

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #98 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,412
Percentile: 70.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: english. Source rank: #119. Votes: 59489. Organization: google. License: Proprietary.

70.2% percentile inside its fair comparison set

1,412Raw benchmark valueCI 1,409 - 1,415

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #88 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 73.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: exclude_ties. Source rank: #108. Votes: 89454. Organization: google. License: Proprietary.

73.2% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,392 - 1,399

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 70.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: hard_prompts. Source rank: #117. Votes: 17499. Organization: google. License: Proprietary.

70.5% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,415 - 1,425

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,423
Percentile: 67.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: hard_prompts_english. Source rank: #130. Votes: 8802. Organization: google. License: Proprietary.

67.3% percentile inside its fair comparison set

1,423Raw benchmark valueCI 1,416 - 1,430

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #84 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 74.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: instruction_following. Source rank: #103. Votes: 9152. Organization: google. License: Proprietary.

74.5% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,395 - 1,408

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #83 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 73%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: longer_query. Source rank: #103. Votes: 31943. Organization: google. License: Proprietary.

73% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,415 - 1,423

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #97 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 70.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: multi_turn. Source rank: #118. Votes: 21617. Organization: google. License: Proprietary.

70.3% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,399 - 1,408

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #75 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 77.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: overall. Source rank: #89. Votes: 124544. Organization: google. License: Proprietary.

77.2% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,415 - 1,420

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #50 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 84.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: creative_writing. Source rank: #63. Votes: 17364. Organization: google. License: Proprietary.

84.8% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,397 - 1,407

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #84 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,418
Percentile: 74.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: english. Source rank: #100. Votes: 59489. Organization: google. License: Proprietary.

74.5% percentile inside its fair comparison set

1,418Raw benchmark valueCI 1,415 - 1,421

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #73 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,405
Percentile: 77.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: exclude_ties. Source rank: #86. Votes: 89454. Organization: google. License: Proprietary.

77.8% percentile inside its fair comparison set

1,405Raw benchmark valueCI 1,401 - 1,408

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #71 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,422
Percentile: 78.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: hard_prompts. Source rank: #88. Votes: 63888. Organization: google. License: Proprietary.

78.5% percentile inside its fair comparison set

1,422Raw benchmark valueCI 1,419 - 1,426

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #85 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 74.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: hard_prompts_english. Source rank: #102. Votes: 32205. Organization: google. License: Proprietary.

74.1% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,416 - 1,424

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #61 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,406
Percentile: 81.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: instruction_following. Source rank: #75. Votes: 33961. Organization: google. License: Proprietary.

81.5% percentile inside its fair comparison set

1,406Raw benchmark valueCI 1,402 - 1,409

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #61 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 80.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: longer_query. Source rank: #74. Votes: 31943. Organization: google. License: Proprietary.

80.3% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,416 - 1,424

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #82 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,408
Percentile: 74.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: multi_turn. Source rank: #99. Votes: 21617. Organization: google. License: Proprietary.

74.9% percentile inside its fair comparison set

1,408Raw benchmark valueCI 1,403 - 1,413

Coding7 benchmarks44%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #137 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 12.1%
Percentile: 55.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

55.3% percentile inside its fair comparison set

12.1%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #184 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 29.1%
Percentile: 50.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

50.5% percentile inside its fair comparison set

29.1%Raw benchmark value

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #51 · Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 76.2%
Percentile: 44.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: Google.

44.4% percentile inside its fair comparison set

76.2%Raw benchmark valueCI 74% - 78.4%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #119 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 63.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: coding. Source rank: #144. Votes: 6843. Organization: google. License: Proprietary.

63.1% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,421 - 1,436

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #84 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 74.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: coding. Source rank: #101. Votes: 25914. Organization: google. License: Proprietary.

74.1% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,419 - 1,428

IOI

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #39 · Source label: google/gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 2.6%
Percentile: 13.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: Google.

13.6% percentile inside its fair comparison set

2.6%Raw benchmark valueCI 0.6% - 4.6%

Terminal-Bench 2.0

TERMINAL-BENCH · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #29 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Terminal-Bench
Raw value: 17.1%
Percentile: 6.7%
Last updated: archived
Eligibility: headline eligible

6.7% percentile inside its fair comparison set

17.1%Raw benchmark value

Reasoning / math / science7 benchmarks60.9%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #214 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 5.1%
Percentile: 42.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

42.4% percentile inside its fair comparison set

5.1%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #151 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 68.3%
Percentile: 59.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

59.9% percentile inside its fair comparison set

68.3%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #121 · Source label: Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0%
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

65.2% percentile inside its fair comparison set

0%Raw benchmark value

GPQA Diamond

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #41 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 81.6%
Percentile: 55.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: gpqa; provider: Google.

55.1% percentile inside its fair comparison set

81.6%Raw benchmark valueCI 77.7% - 85.5%

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #44 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.7%
Percentile: 51.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Google.

51.7% percentile inside its fair comparison set

83.7%Raw benchmark valueCI 83% - 84.4%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #79 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,412
Percentile: 75.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: math. Source rank: #98. Votes: 1944. Organization: google. License: Proprietary.

75.2% percentile inside its fair comparison set

1,412Raw benchmark valueCI 1,399 - 1,425

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #74 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 76.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: math. Source rank: #90. Votes: 1944. Organization: google. License: Proprietary.

76.8% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,403 - 1,429

Professional reasoning24 benchmarks70.9%

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #26 · Source label: google/gemini-2.5-flash-preview-04-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.8%
Percentile: 72.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Google.

72.2% percentile inside its fair comparison set

83.8%Raw benchmark valueCI 83% - 84.6%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #40 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 72.7%
Percentile: 57.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Google.

57.1% percentile inside its fair comparison set

72.7%Raw benchmark valueCI 71% - 74.4%

MedCode

VALS-AI · Professional reasoning · Objective

Medical billing support and coding tasks.

Rank #27 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 40.5%
Percentile: 49%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

49% percentile inside its fair comparison set

40.5%Raw benchmark valueCI 36.8% - 44.3%

MedScribe

VALS-AI · Professional reasoning · Objective

Administrative documentation support for doctors.

Rank #15 · Source label: google/gemini-2.5-flash-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83%
Percentile: 72%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Google.

72% percentile inside its fair comparison set

83%Raw benchmark valueCI 79.2% - 86.7%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #73 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,440
Percentile: 73.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: expert. Source rank: #91. Votes: 1629. Organization: google. License: Proprietary.

73.8% percentile inside its fair comparison set

1,440Raw benchmark valueCI 1,425 - 1,454

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #93 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 71.1%
Last updated: recent
Eligibility: headline eligible

71.1% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,395 - 1,411

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #74 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 77.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #93. Votes: 22885. Organization: google. License: Proprietary.

77.4% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,383 - 1,392

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #70 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,428
Percentile: 76.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_legal_and_government. Source rank: #89. Votes: 2161. Organization: google. License: Proprietary.

76.8% percentile inside its fair comparison set

1,428Raw benchmark valueCI 1,415 - 1,441

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #73 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 77.7%
Last updated: recent
Eligibility: headline eligible

77.7% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,428 - 1,445

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #79 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,421
Percentile: 74.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_mathematical. Source rank: #96. Votes: 1630. Organization: google. License: Proprietary.

74.7% percentile inside its fair comparison set

1,421Raw benchmark valueCI 1,406 - 1,435

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #96 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,427
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_medicine_and_healthcare. Source rank: #116. Votes: 1642. Organization: google. License: Proprietary.

67.8% percentile inside its fair comparison set

1,427Raw benchmark valueCI 1,412 - 1,442

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #112 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,423
Percentile: 65.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: industry_software_and_it_services. Source rank: #135. Votes: 11752. Organization: google. License: Proprietary.

65.8% percentile inside its fair comparison set

1,423Raw benchmark valueCI 1,417 - 1,429

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #64 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,404
Percentile: 80.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_writing_and_literature_and_language. Source rank: #81. Votes: 28038. Organization: google. License: Proprietary.

80.6% percentile inside its fair comparison set

1,404Raw benchmark valueCI 1,400 - 1,408

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #67 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,425
Percentile: 76%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: expert. Source rank: #81. Votes: 7786. Organization: google. License: Proprietary.

76% percentile inside its fair comparison set

1,425Raw benchmark valueCI 1,418 - 1,432

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #78 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 75.8%
Last updated: recent
Eligibility: headline eligible

75.8% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,399 - 1,408

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #59 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 82%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #72. Votes: 22885. Organization: google. License: Proprietary.

82% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,390 - 1,399

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #57 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,433
Percentile: 81.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_legal_and_government. Source rank: #70. Votes: 8733. Organization: google. License: Proprietary.

81.2% percentile inside its fair comparison set

1,433Raw benchmark valueCI 1,426 - 1,440

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #63 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,436
Percentile: 80.8%
Last updated: recent
Eligibility: headline eligible

80.8% percentile inside its fair comparison set

1,436Raw benchmark valueCI 1,427 - 1,444

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #73 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,422
Percentile: 76.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_mathematical. Source rank: #86. Votes: 6798. Organization: google. License: Proprietary.

76.6% percentile inside its fair comparison set

1,422Raw benchmark valueCI 1,414 - 1,429

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #70 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,429
Percentile: 76.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_medicine_and_healthcare. Source rank: #82. Votes: 7717. Organization: google. License: Proprietary.

76.6% percentile inside its fair comparison set

1,429Raw benchmark valueCI 1,422 - 1,436

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #87 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,426
Percentile: 73.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_software_and_it_services. Source rank: #104. Votes: 43644. Organization: google. License: Proprietary.

73.5% percentile inside its fair comparison set

1,426Raw benchmark valueCI 1,422 - 1,429

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #55 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,407
Percentile: 83.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: industry_writing_and_literature_and_language. Source rank: #67. Votes: 28038. Organization: google. License: Proprietary.

83.3% percentile inside its fair comparison set

1,407Raw benchmark valueCI 1,403 - 1,411

SAGE

VALS-AI · Professional reasoning · Objective

Student Assessment with Generative Evaluation.

Rank #21 · Source label: google/gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 44.8%
Percentile: 55.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Google.

55.6% percentile inside its fair comparison set

44.8%Raw benchmark valueCI 38.1% - 51.5%

PRBench Legal

SL · Professional reasoning · Rubric

Applied legal reasoning on professional-domain tasks.

Rank #10 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 41%
Percentile: 25%
Last updated: recent
Eligibility: headline eligible

25% percentile inside its fair comparison set

41%Raw benchmark value

Search / tool use1 benchmark15.2%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #263 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 14.9%
Percentile: 15.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

15.2% percentile inside its fair comparison set

14.9%Raw benchmark value

Long context2 benchmarks55%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #102 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 45.9%
Percentile: 67.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

67.9% percentile inside its fair comparison set

45.9%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #52 · Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 59.8%
Percentile: 42%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: Google.

42% percentile inside its fair comparison set

59.8%Raw benchmark valueCI 57.9% - 61.6%

Vision understanding22 benchmarks64.8%

MMMU-Pro

AA · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #48 · Source label: Gemini 2.5 Flash (Non-reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 65.5%
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `mmmuPro`.

65.2% percentile inside its fair comparison set

65.5%Raw benchmark value

Vision Arena

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #33 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,226
Percentile: 70.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: overall. Source rank: #43. Votes: 4726. Organization: google. License: Proprietary.

70.6% percentile inside its fair comparison set

1,226Raw benchmark valueCI 1,216 - 1,236

Vision Arena · Captioning

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #8 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,214
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: captioning. Source rank: #8. Votes: 595. Organization: google. License: Proprietary.

73.1% percentile inside its fair comparison set

1,214Raw benchmark valueCI 1,189 - 1,239

Vision Arena · Creative Writing Vision

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #24 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,239
Percentile: 58.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing_vision. Source rank: #31. Votes: 294. Organization: google. License: Proprietary.

58.2% percentile inside its fair comparison set

1,239Raw benchmark valueCI 1,206 - 1,273

Vision Arena · Diagram

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #38 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,243
Percentile: 47.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: diagram. Source rank: #49. Votes: 972. Organization: google. License: Proprietary.

47.1% percentile inside its fair comparison set

1,243Raw benchmark valueCI 1,224 - 1,261

Vision Arena · English

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #30 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,231
Percentile: 73.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: english. Source rank: #39. Votes: 2057. Organization: google. License: Proprietary.

73.4% percentile inside its fair comparison set

1,231Raw benchmark valueCI 1,217 - 1,245

Vision Arena · Entity Recognition

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #17 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,218
Percentile: 50%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: entity_recognition. Source rank: #16. Votes: 590. Organization: google. License: Proprietary.

50% percentile inside its fair comparison set

1,218Raw benchmark valueCI 1,194 - 1,242

Vision Arena · Homework

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #28 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,266
Percentile: 60.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: homework. Source rank: #36. Votes: 624. Organization: google. License: Proprietary.

60.3% percentile inside its fair comparison set

1,266Raw benchmark valueCI 1,244 - 1,289

Vision Arena · Humor

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #28 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,227
Percentile: 44.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: humor. Source rank: #35. Votes: 216. Organization: google. License: Proprietary.

44.9% percentile inside its fair comparison set

1,227Raw benchmark valueCI 1,190 - 1,264

Vision Arena · Ocr

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #33 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,241
Percentile: 54.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: ocr. Source rank: #42. Votes: 2784. Organization: google. License: Proprietary.

54.3% percentile inside its fair comparison set

1,241Raw benchmark valueCI 1,229 - 1,252

Vision Arena · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #27 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,250
Percentile: 76.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: overall. Source rank: #34. Votes: 4726. Organization: google. License: Proprietary.

76.1% percentile inside its fair comparison set

1,250Raw benchmark valueCI 1,240 - 1,260

Vision Arena · Captioning · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #8 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,278
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: captioning. Source rank: #8. Votes: 595. Organization: google. License: Proprietary.

73.1% percentile inside its fair comparison set

1,278Raw benchmark valueCI 1,254 - 1,301

Vision Arena · Creative Writing Vision · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #18 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,265
Percentile: 69.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing_vision. Source rank: #23. Votes: 294. Organization: google. License: Proprietary.

69.1% percentile inside its fair comparison set

1,265Raw benchmark valueCI 1,232 - 1,298

Vision Arena · Diagram · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #35 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,251
Percentile: 51.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: diagram. Source rank: #43. Votes: 972. Organization: google. License: Proprietary.

51.4% percentile inside its fair comparison set

1,251Raw benchmark valueCI 1,233 - 1,270

Vision Arena · English · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #25 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,255
Percentile: 78%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: english. Source rank: #30. Votes: 2057. Organization: google. License: Proprietary.

78% percentile inside its fair comparison set

1,255Raw benchmark valueCI 1,242 - 1,269

Vision Arena · Entity Recognition · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #15 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,247
Percentile: 56.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: entity_recognition. Source rank: #14. Votes: 590. Organization: google. License: Proprietary.

56.3% percentile inside its fair comparison set

1,247Raw benchmark valueCI 1,223 - 1,270

Vision Arena · Homework · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #25 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,279
Percentile: 64.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: homework. Source rank: #31. Votes: 624. Organization: google. License: Proprietary.

64.7% percentile inside its fair comparison set

1,279Raw benchmark valueCI 1,256 - 1,301

Vision Arena · Humor · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #19 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,262
Percentile: 63.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: humor. Source rank: #24. Votes: 216. Organization: google. License: Proprietary.

63.3% percentile inside its fair comparison set

1,262Raw benchmark valueCI 1,226 - 1,299

Vision Arena · Ocr · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #26 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,257
Percentile: 64.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: ocr. Source rank: #32. Votes: 2784. Organization: google. License: Proprietary.

64.3% percentile inside its fair comparison set

1,257Raw benchmark valueCI 1,246 - 1,269

MMMU Pro

VALS-AI · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #26 · Source label: google/gemini-2.5-flash-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 80.8%
Percentile: 56.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: Google.

56.9% percentile inside its fair comparison set

80.8%Raw benchmark valueCI 78.9% - 82.6%

Vision Arena · Creative Writing

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,263
Percentile: 87.5%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing. Source rank: #6. Votes: 555. Organization: google. License: Proprietary.

87.5% percentile inside its fair comparison set

1,263Raw benchmark valueCI 1,238 - 1,289

Vision Arena · Creative Writing · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,289
Percentile: 87.5%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: creative_writing. Source rank: #5. Votes: 555. Organization: google. License: Proprietary.

87.5% percentile inside its fair comparison set

1,289Raw benchmark valueCI 1,264 - 1,313

Document understanding1 benchmark48.3%

MortgageTax

VALS-AI · Document understanding · Objective

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #32 · Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 62.6%
Percentile: 48.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mortgage_tax; provider: Google.

48.3% percentile inside its fair comparison set

62.6%Raw benchmark valueCI 60.7% - 64.5%

Multilingual16 benchmarks74.6%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #78 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,449
Percentile: 73.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #95. Votes: 1413. Organization: google. License: Proprietary.

73.9% percentile inside its fair comparison set

1,449Raw benchmark valueCI 1,433 - 1,465

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #69 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,441
Percentile: 68.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: french. Source rank: #86. Votes: 521. Organization: google. License: Proprietary.

68.5% percentile inside its fair comparison set

1,441Raw benchmark valueCI 1,414 - 1,467

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #58 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 75.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: german. Source rank: #76. Votes: 602. Organization: google. License: Proprietary.

75.9% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,394 - 1,443

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #39 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,400
Percentile: 81.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: japanese. Source rank: #55. Votes: 303. Organization: google. License: Proprietary.

81.3% percentile inside its fair comparison set

1,400Raw benchmark valueCI 1,365 - 1,434

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #57 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,378
Percentile: 73.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: korean. Source rank: #72. Votes: 656. Organization: google. License: Proprietary.

73.1% percentile inside its fair comparison set

1,378Raw benchmark valueCI 1,355 - 1,402

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #66 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 77.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: russian. Source rank: #85. Votes: 10168. Organization: google. License: Proprietary.

77.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,407 - 1,419

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #73 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,405
Percentile: 66.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: spanish. Source rank: #91. Votes: 3013. Organization: google. License: Proprietary.

66.4% percentile inside its fair comparison set

1,405Raw benchmark valueCI 1,393 - 1,418

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #74 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,452
Percentile: 75.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #88. Votes: 1413. Organization: google. License: Proprietary.

75.3% percentile inside its fair comparison set

1,452Raw benchmark valueCI 1,437 - 1,468

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #56 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 74.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: french. Source rank: #68. Votes: 521. Organization: google. License: Proprietary.

74.5% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,412 - 1,465

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #46 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,424
Percentile: 81%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: german. Source rank: #57. Votes: 602. Organization: google. License: Proprietary.

81% percentile inside its fair comparison set

1,424Raw benchmark valueCI 1,400 - 1,448

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #25 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,407
Percentile: 88.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: japanese. Source rank: #34. Votes: 303. Organization: google. License: Proprietary.

88.2% percentile inside its fair comparison set

1,407Raw benchmark valueCI 1,372 - 1,442

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #44 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,392
Percentile: 79.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: korean. Source rank: #53. Votes: 656. Organization: google. License: Proprietary.

79.3% percentile inside its fair comparison set

1,392Raw benchmark valueCI 1,368 - 1,416

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #57 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,417
Percentile: 80.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: russian. Source rank: #71. Votes: 10168. Organization: google. License: Proprietary.

80.6% percentile inside its fair comparison set

1,417Raw benchmark valueCI 1,411 - 1,423

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #59 · Source label: gemini-2.5-flash

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 72.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: spanish. Source rank: #72. Votes: 3013. Organization: google. License: Proprietary.

72.9% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,407 - 1,432

Vision Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #33 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,259
Percentile: 58.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #42. Votes: 390. Organization: google. License: Proprietary.

58.4% percentile inside its fair comparison set

1,259Raw benchmark valueCI 1,229 - 1,289

Vision Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #26 · Source label: gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,292
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: chinese. Source rank: #33. Votes: 390. Organization: google. License: Proprietary.

67.5% percentile inside its fair comparison set

1,292Raw benchmark valueCI 1,262 - 1,322