Intelligence Index
AA · Chat / text · Combined
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #81 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 25
- Percentile
- 79.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.
79.7% percentile inside its fair comparison set25Raw benchmark value
AA-Omniscience accuracy
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #105 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 20.5%
- Percentile
- 65.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.
65.1% percentile inside its fair comparison set20.5%Raw benchmark value
AA-Omniscience non-hallucination
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #237 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 8.5%
- Percentile
- 20.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.
20.8% percentile inside its fair comparison set8.5%Raw benchmark value
IFBench
AA · Chat / text · Objective
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #30 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 69.8%
- Percentile
- 90.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `ifbench`.
90.8% percentile inside its fair comparison set69.8%Raw benchmark value
Blended price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #106 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.4 /1M tokens
- Percentile
- 62%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.
62% percentile inside its fair comparison set$0.4 /1M tokensRaw benchmark value
Input price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #114 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.3 /1M input tokens
- Percentile
- 59.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.
59.8% percentile inside its fair comparison set$0.3 /1M input tokensRaw benchmark value
Output price
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #103 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- $0.8 /1M output tokens
- Percentile
- 63.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.
63.8% percentile inside its fair comparison set$0.8 /1M output tokensRaw benchmark value
Output Speed
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #1 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 978.6 tokens/s
- Percentile
- 100%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.
100% percentile inside its fair comparison set978.6 tokens/sRaw benchmark value
Time to first token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #146 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 3.17s
- Percentile
- 31%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.
31% percentile inside its fair comparison set3.17sRaw benchmark value
Time to first answer token
AA · Chat / text · Speed / cost
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #78 · Source label: Mercury 2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 3.17s
- Percentile
- 63.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.
63.3% percentile inside its fair comparison set3.17sRaw benchmark value
Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #150
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,346
- Percentile
- 54.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #179. Votes: 3124. Organization: inception-ai. License: Proprietary.
54.2% percentile inside its fair comparison set1,346Raw benchmark valueCI 1,336 - 1,357
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #163
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,300
- Percentile
- 49.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: creative_writing. Source rank: #196. Votes: 528. Organization: inception-ai. License: Proprietary.
49.8% percentile inside its fair comparison set1,300Raw benchmark valueCI 1,273 - 1,326
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #151
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,364
- Percentile
- 53.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: english. Source rank: #179. Votes: 1397. Organization: inception-ai. License: Proprietary.
53.8% percentile inside its fair comparison set1,364Raw benchmark valueCI 1,349 - 1,380
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #155
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,299
- Percentile
- 52.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: exclude_ties. Source rank: #185. Votes: 2172. Organization: inception-ai. License: Proprietary.
52.6% percentile inside its fair comparison set1,299Raw benchmark valueCI 1,283 - 1,315
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #151
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,359
- Percentile
- 53.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts. Source rank: #181. Votes: 1761. Organization: inception-ai. License: Proprietary.
53.8% percentile inside its fair comparison set1,359Raw benchmark valueCI 1,345 - 1,373
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #151
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,371
- Percentile
- 53.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts_english. Source rank: #180. Votes: 866. Organization: inception-ai. License: Proprietary.
53.7% percentile inside its fair comparison set1,371Raw benchmark valueCI 1,351 - 1,391
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #155
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,322
- Percentile
- 52.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: instruction_following. Source rank: #187. Votes: 838. Organization: inception-ai. License: Proprietary.
52.6% percentile inside its fair comparison set1,322Raw benchmark valueCI 1,302 - 1,342
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #181
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,324
- Percentile
- 40.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: longer_query. Source rank: #215. Votes: 843. Organization: inception-ai. License: Proprietary.
40.8% percentile inside its fair comparison set1,324Raw benchmark valueCI 1,304 - 1,345
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #145
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,344
- Percentile
- 55.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: multi_turn. Source rank: #173. Votes: 540. Organization: inception-ai. License: Proprietary.
55.4% percentile inside its fair comparison set1,344Raw benchmark valueCI 1,318 - 1,369
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #128
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,357
- Percentile
- 60.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #154. Votes: 3124. Organization: inception-ai. License: Proprietary.
60.9% percentile inside its fair comparison set1,357Raw benchmark valueCI 1,346 - 1,367
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #150
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,296
- Percentile
- 53.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: creative_writing. Source rank: #178. Votes: 528. Organization: inception-ai. License: Proprietary.
53.9% percentile inside its fair comparison set1,296Raw benchmark valueCI 1,270 - 1,321
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #124
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,378
- Percentile
- 62.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: english. Source rank: #147. Votes: 1397. Organization: inception-ai. License: Proprietary.
62.2% percentile inside its fair comparison set1,378Raw benchmark valueCI 1,362 - 1,393
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #133
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,314
- Percentile
- 59.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: exclude_ties. Source rank: #159. Votes: 2172. Organization: inception-ai. License: Proprietary.
59.4% percentile inside its fair comparison set1,314Raw benchmark valueCI 1,299 - 1,330
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #128
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,361
- Percentile
- 60.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts. Source rank: #154. Votes: 1761. Organization: inception-ai. License: Proprietary.
60.9% percentile inside its fair comparison set1,361Raw benchmark valueCI 1,348 - 1,375
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #122
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,376
- Percentile
- 62.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts_english. Source rank: #146. Votes: 866. Organization: inception-ai. License: Proprietary.
62.7% percentile inside its fair comparison set1,376Raw benchmark valueCI 1,357 - 1,396
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #136
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,322
- Percentile
- 58.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: instruction_following. Source rank: #164. Votes: 838. Organization: inception-ai. License: Proprietary.
58.5% percentile inside its fair comparison set1,322Raw benchmark valueCI 1,302 - 1,342
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #142
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,328
- Percentile
- 53.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: longer_query. Source rank: #170. Votes: 843. Organization: inception-ai. License: Proprietary.
53.6% percentile inside its fair comparison set1,328Raw benchmark valueCI 1,308 - 1,348
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #124
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,357
- Percentile
- 61.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `mercury-2`. Category: multi_turn. Source rank: #150. Votes: 540. Organization: inception-ai. License: Proprietary.
61.9% percentile inside its fair comparison set1,357Raw benchmark valueCI 1,332 - 1,382