Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #172 · Source label: Gemini 2.5 Flash (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 14
- Percentile
- 56.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
56.7% percentile inside its fair comparison set14Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: Gemini 2.5 Flash (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 25.1%
- Percentile
- 81.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
81.9% percentile inside its fair comparison set25.1%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #256 · Source label: Gemini 2.5 Flash (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 6.7%
- Percentile
- 14.4%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
14.4% percentile inside its fair comparison set6.7%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #180 · Source label: Gemini 2.5 Flash (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 39%
- Percentile
- 43.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `ifbench`.
43.2% percentile inside its fair comparison set39%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #164 · Source label: Gemini 2.5 Flash (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.9 /1M tokens
- Percentile
- 44.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
44.2% percentile inside its fair comparison set$0.9 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #133 · Source label: Gemini 2.5 Flash (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.3 /1M input tokens
- Percentile
- 56.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
56.5% percentile inside its fair comparison set$0.3 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #177 · Source label: Gemini 2.5 Flash (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $2.5 /1M output tokens
- Percentile
- 39.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
39.9% percentile inside its fair comparison set$2.5 /1M output tokensRaw benchmark value
Output Speed
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #20 · Source label: Gemini 2.5 Flash (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 228.7 tokens/s
- Percentile
- 91%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.
91% percentile inside its fair comparison set228.7 tokens/sRaw benchmark value
Time to first token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #194 · Source label: Gemini 2.5 Flash (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 23.26s
- Percentile
- 8.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.
8.1% percentile inside its fair comparison set23.26sRaw benchmark value
Time to first answer token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #139 · Source label: Gemini 2.5 Flash (Reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 23.26s
- Percentile
- 34.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.
34.3% percentile inside its fair comparison set23.26sRaw benchmark value
Openness Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #161 · Source label: Gemini 2.5 Flash Preview (Sep '25) (Non-reasoning)
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 11
- Percentile
- 15.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.
15.6% percentile inside its fair comparison set11Raw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #88 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,410
- Percentile
- 73.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: overall. Source rank: #108. Votes: 124544. Organization: google. License: Proprietary.
73.2% percentile inside its fair comparison set1,410Raw benchmark valueCI 1,408 - 1,413
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #62 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,397
- Percentile
- 81.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: creative_writing. Source rank: #76. Votes: 17364. Organization: google. License: Proprietary.
81.1% percentile inside its fair comparison set1,397Raw benchmark valueCI 1,392 - 1,402
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #98 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,412
- Percentile
- 70.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: english. Source rank: #119. Votes: 59489. Organization: google. License: Proprietary.
70.2% percentile inside its fair comparison set1,412Raw benchmark valueCI 1,409 - 1,415
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #88 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 73.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: exclude_ties. Source rank: #108. Votes: 89454. Organization: google. License: Proprietary.
73.2% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,392 - 1,399
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #97 · Source label: gemini-2.5-flash-preview-09-2025
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 70.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: hard_prompts. Source rank: #117. Votes: 17499. Organization: google. License: Proprietary.
70.5% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,415 - 1,425
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #107 · Source label: gemini-2.5-flash-preview-09-2025
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,423
- Percentile
- 67.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: hard_prompts_english. Source rank: #130. Votes: 8802. Organization: google. License: Proprietary.
67.3% percentile inside its fair comparison set1,423Raw benchmark valueCI 1,416 - 1,430
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #84 · Source label: gemini-2.5-flash-preview-09-2025
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 74.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash-preview-09-2025`. Category: instruction_following. Source rank: #103. Votes: 9152. Organization: google. License: Proprietary.
74.5% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,395 - 1,408
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #83 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 73%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: longer_query. Source rank: #103. Votes: 31943. Organization: google. License: Proprietary.
73% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,415 - 1,423
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #97 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,403
- Percentile
- 70.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: multi_turn. Source rank: #118. Votes: 21617. Organization: google. License: Proprietary.
70.3% percentile inside its fair comparison set1,403Raw benchmark valueCI 1,399 - 1,408
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #75 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 77.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: overall. Source rank: #89. Votes: 124544. Organization: google. License: Proprietary.
77.2% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,415 - 1,420
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #50 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 84.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: creative_writing. Source rank: #63. Votes: 17364. Organization: google. License: Proprietary.
84.8% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,397 - 1,407
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #84 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,418
- Percentile
- 74.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: english. Source rank: #100. Votes: 59489. Organization: google. License: Proprietary.
74.5% percentile inside its fair comparison set1,418Raw benchmark valueCI 1,415 - 1,421
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #73 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,405
- Percentile
- 77.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: exclude_ties. Source rank: #86. Votes: 89454. Organization: google. License: Proprietary.
77.8% percentile inside its fair comparison set1,405Raw benchmark valueCI 1,401 - 1,408
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #71 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,422
- Percentile
- 78.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: hard_prompts. Source rank: #88. Votes: 63888. Organization: google. License: Proprietary.
78.5% percentile inside its fair comparison set1,422Raw benchmark valueCI 1,419 - 1,426
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #85 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 74.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: hard_prompts_english. Source rank: #102. Votes: 32205. Organization: google. License: Proprietary.
74.1% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,416 - 1,424
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: instruction_following. Source rank: #75. Votes: 33961. Organization: google. License: Proprietary.
81.5% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,402 - 1,409
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 80.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: longer_query. Source rank: #74. Votes: 31943. Organization: google. License: Proprietary.
80.3% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,416 - 1,424
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #82 · Source label: gemini-2.5-flash
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,408
- Percentile
- 74.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `gemini-2.5-flash`. Category: multi_turn. Source rank: #99. Votes: 21617. Organization: google. License: Proprietary.
74.9% percentile inside its fair comparison set1,408Raw benchmark valueCI 1,403 - 1,413