Model profile · Unknown

mercury-2

Unknown weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 13.7%
Verified coverage: 13.7%
Spread: n/a
Last verified: Jun 20, 2026

textcodedocument2 aliases23 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text28 benchmarks58.5%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #81 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25
Percentile: 79.7%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

79.7% percentile inside its fair comparison set

25Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #105 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 20.5%
Percentile: 65.1%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

65.1% percentile inside its fair comparison set

20.5%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #237 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 8.5%
Percentile: 20.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

20.8% percentile inside its fair comparison set

8.5%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #30 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 69.8%
Percentile: 90.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `ifbench`.

90.8% percentile inside its fair comparison set

69.8%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.4 /1M tokens
Percentile: 62%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

62% percentile inside its fair comparison set

$0.4 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #114 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.3 /1M input tokens
Percentile: 59.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

59.8% percentile inside its fair comparison set

$0.3 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #103 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.8 /1M output tokens
Percentile: 63.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

63.8% percentile inside its fair comparison set

$0.8 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #1 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 978.6 tokens/s
Percentile: 100%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

100% percentile inside its fair comparison set

978.6 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #146 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 3.17s
Percentile: 31%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

31% percentile inside its fair comparison set

3.17sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #78 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 3.17s
Percentile: 63.3%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

63.3% percentile inside its fair comparison set

3.17sRaw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #150

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,346
Percentile: 54.2%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #179. Votes: 3124. Organization: inception-ai. License: Proprietary.

54.2% percentile inside its fair comparison set

1,346Raw benchmark valueCI 1,336 - 1,357

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #163

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,300
Percentile: 49.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: creative_writing. Source rank: #196. Votes: 528. Organization: inception-ai. License: Proprietary.

49.8% percentile inside its fair comparison set

1,300Raw benchmark valueCI 1,273 - 1,326

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #151

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 53.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: english. Source rank: #179. Votes: 1397. Organization: inception-ai. License: Proprietary.

53.8% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,349 - 1,380

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #155

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,299
Percentile: 52.6%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: exclude_ties. Source rank: #185. Votes: 2172. Organization: inception-ai. License: Proprietary.

52.6% percentile inside its fair comparison set

1,299Raw benchmark valueCI 1,283 - 1,315

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #151

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,359
Percentile: 53.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts. Source rank: #181. Votes: 1761. Organization: inception-ai. License: Proprietary.

53.8% percentile inside its fair comparison set

1,359Raw benchmark valueCI 1,345 - 1,373

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #151

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,371
Percentile: 53.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts_english. Source rank: #180. Votes: 866. Organization: inception-ai. License: Proprietary.

53.7% percentile inside its fair comparison set

1,371Raw benchmark valueCI 1,351 - 1,391

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #155

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,322
Percentile: 52.6%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: instruction_following. Source rank: #187. Votes: 838. Organization: inception-ai. License: Proprietary.

52.6% percentile inside its fair comparison set

1,322Raw benchmark valueCI 1,302 - 1,342

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #181

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 40.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: longer_query. Source rank: #215. Votes: 843. Organization: inception-ai. License: Proprietary.

40.8% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,304 - 1,345

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #145

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,344
Percentile: 55.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: multi_turn. Source rank: #173. Votes: 540. Organization: inception-ai. License: Proprietary.

55.4% percentile inside its fair comparison set

1,344Raw benchmark valueCI 1,318 - 1,369

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #128

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,357
Percentile: 60.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #154. Votes: 3124. Organization: inception-ai. License: Proprietary.

60.9% percentile inside its fair comparison set

1,357Raw benchmark valueCI 1,346 - 1,367

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #150

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,296
Percentile: 53.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: creative_writing. Source rank: #178. Votes: 528. Organization: inception-ai. License: Proprietary.

53.9% percentile inside its fair comparison set

1,296Raw benchmark valueCI 1,270 - 1,321

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,378
Percentile: 62.2%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: english. Source rank: #147. Votes: 1397. Organization: inception-ai. License: Proprietary.

62.2% percentile inside its fair comparison set

1,378Raw benchmark valueCI 1,362 - 1,393

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #133

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,314
Percentile: 59.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: exclude_ties. Source rank: #159. Votes: 2172. Organization: inception-ai. License: Proprietary.

59.4% percentile inside its fair comparison set

1,314Raw benchmark valueCI 1,299 - 1,330

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #128

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,361
Percentile: 60.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts. Source rank: #154. Votes: 1761. Organization: inception-ai. License: Proprietary.

60.9% percentile inside its fair comparison set

1,361Raw benchmark valueCI 1,348 - 1,375

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 62.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts_english. Source rank: #146. Votes: 866. Organization: inception-ai. License: Proprietary.

62.7% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,357 - 1,396

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #136

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,322
Percentile: 58.5%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: instruction_following. Source rank: #164. Votes: 838. Organization: inception-ai. License: Proprietary.

58.5% percentile inside its fair comparison set

1,322Raw benchmark valueCI 1,302 - 1,342

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #142

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,328
Percentile: 53.6%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: longer_query. Source rank: #170. Votes: 843. Organization: inception-ai. License: Proprietary.

53.6% percentile inside its fair comparison set

1,328Raw benchmark valueCI 1,308 - 1,348

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,357
Percentile: 61.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: multi_turn. Source rank: #150. Votes: 540. Organization: inception-ai. License: Proprietary.

61.9% percentile inside its fair comparison set

1,357Raw benchmark valueCI 1,332 - 1,382

Coding8 benchmarks36.1%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #78 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 26.5%
Percentile: 74.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

74.8% percentile inside its fair comparison set

26.5%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #79 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 38.7%
Percentile: 79.3%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `scicode`.

79.3% percentile inside its fair comparison set

38.7%Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #71

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,165
Percentile: 4.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #87. Votes: 948. Organization: inception-ai. License: Proprietary.

4.1% percentile inside its fair comparison set

1,165Raw benchmark valueCI 1,142 - 1,188

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #71

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,165
Percentile: 4.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: webdev. Source rank: #87. Votes: 948. Organization: inception-ai. License: Proprietary.

4.1% percentile inside its fair comparison set

1,165Raw benchmark valueCI 1,142 - 1,188

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #71

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,207
Percentile: 4.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: webdev-html. Source rank: #86. Votes: 100. Organization: inception-ai. License: Proprietary.

4.1% percentile inside its fair comparison set

1,207Raw benchmark valueCI 1,137 - 1,277

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #60

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,154
Percentile: 0%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: webdev-react. Source rank: #74. Votes: 847. Organization: inception-ai. License: Proprietary.

0% percentile inside its fair comparison set

1,154Raw benchmark valueCI 1,130 - 1,178

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #141

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 56.3%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: coding. Source rank: #169. Votes: 769. Organization: inception-ai. License: Proprietary.

56.3% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,376 - 1,418

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #110

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 65.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: coding. Source rank: #134. Votes: 769. Organization: inception-ai. License: Proprietary.

65.9% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,372 - 1,413

Reasoning / math / science3 benchmarks81%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #54 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 15.5%
Percentile: 85.7%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `hle`.

85.7% percentile inside its fair comparison set

15.5%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #87 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 77%
Percentile: 77%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `gpqa`.

77% percentile inside its fair comparison set

77%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #60 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0.8%
Percentile: 80.5%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `critpt`.

80.5% percentile inside its fair comparison set

0.8%Raw benchmark value

Professional reasoning14 benchmarks53.9%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #144

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,349
Percentile: 48%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: expert. Source rank: #174. Votes: 229. Organization: inception-ai. License: Proprietary.

48% percentile inside its fair comparison set

1,349Raw benchmark valueCI 1,311 - 1,387

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #157

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,339
Percentile: 50.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_business_and_management_and_financial_operations. Source rank: #185. Votes: 482. Organization: inception-ai. License: Proprietary.

50.9% percentile inside its fair comparison set

1,339Raw benchmark valueCI 1,314 - 1,364

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #145

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,311
Percentile: 55.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_entertainment_and_sports_and_media. Source rank: #175. Votes: 658. Organization: inception-ai. License: Proprietary.

55.4% percentile inside its fair comparison set

1,311Raw benchmark valueCI 1,288 - 1,334

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #194

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,322
Percentile: 40.2%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_life_and_physical_and_social_science. Source rank: #228. Votes: 471. Organization: inception-ai. License: Proprietary.

40.2% percentile inside its fair comparison set

1,322Raw benchmark valueCI 1,295 - 1,349

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #161

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,344
Percentile: 45.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_medicine_and_healthcare. Source rank: #192. Votes: 228. Organization: inception-ai. License: Proprietary.

45.8% percentile inside its fair comparison set

1,344Raw benchmark valueCI 1,306 - 1,382

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #147

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,384
Percentile: 55.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_software_and_it_services. Source rank: #175. Votes: 1089. Organization: inception-ai. License: Proprietary.

55.1% percentile inside its fair comparison set

1,384Raw benchmark valueCI 1,366 - 1,402

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #165

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,310
Percentile: 49.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_writing_and_literature_and_language. Source rank: #198. Votes: 695. Organization: inception-ai. License: Proprietary.

49.4% percentile inside its fair comparison set

1,310Raw benchmark valueCI 1,287 - 1,332

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #121

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,356
Percentile: 56.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: expert. Source rank: #147. Votes: 229. Organization: inception-ai. License: Proprietary.

56.4% percentile inside its fair comparison set

1,356Raw benchmark valueCI 1,319 - 1,393

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,350
Percentile: 61.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_business_and_management_and_financial_operations. Source rank: #146. Votes: 482. Organization: inception-ai. License: Proprietary.

61.9% percentile inside its fair comparison set

1,350Raw benchmark valueCI 1,326 - 1,375

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #128

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,317
Percentile: 60.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_entertainment_and_sports_and_media. Source rank: #155. Votes: 658. Organization: inception-ai. License: Proprietary.

60.7% percentile inside its fair comparison set

1,317Raw benchmark valueCI 1,294 - 1,339

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #147

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,345
Percentile: 54.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_life_and_physical_and_social_science. Source rank: #173. Votes: 471. Organization: inception-ai. License: Proprietary.

54.8% percentile inside its fair comparison set

1,345Raw benchmark valueCI 1,318 - 1,372

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #120

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,369
Percentile: 59.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_medicine_and_healthcare. Source rank: #144. Votes: 228. Organization: inception-ai. License: Proprietary.

59.7% percentile inside its fair comparison set

1,369Raw benchmark valueCI 1,332 - 1,406

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #120

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 63.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_software_and_it_services. Source rank: #143. Votes: 1089. Organization: inception-ai. License: Proprietary.

63.4% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,368 - 1,403

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #152

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,303
Percentile: 53.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_writing_and_literature_and_language. Source rank: #180. Votes: 695. Organization: inception-ai. License: Proprietary.

53.4% percentile inside its fair comparison set

1,303Raw benchmark valueCI 1,281 - 1,326

Search / tool use1 benchmark69.9%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #94 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 70.8%
Percentile: 69.9%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `tau2`.

69.9% percentile inside its fair comparison set

70.8%Raw benchmark value

Long context1 benchmark59.7%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #129 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 36.3%
Percentile: 59.7%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `lcr`.

59.7% percentile inside its fair comparison set

36.3%Raw benchmark value

Multilingual2 benchmarks43.8%

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #174

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,306
Percentile: 40.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: russian. Source rank: #208. Votes: 360. Organization: inception-ai. License: Proprietary.

40.1% percentile inside its fair comparison set

1,306Raw benchmark valueCI 1,275 - 1,336

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #153

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,303
Percentile: 47.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: russian. Source rank: #181. Votes: 360. Organization: inception-ai. License: Proprietary.

47.4% percentile inside its fair comparison set

1,303Raw benchmark valueCI 1,273 - 1,334

Source links and registry checks

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

Model profile · Unknown

mercury-2

Unknown weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 13.7%
Verified coverage: 13.7%
Spread: n/a
Last verified: Jun 20, 2026

textcodedocument2 aliases23 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text28 benchmarks58.5%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #81 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25
Percentile: 79.7%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

79.7% percentile inside its fair comparison set

25Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #105 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 20.5%
Percentile: 65.1%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

65.1% percentile inside its fair comparison set

20.5%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #237 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 8.5%
Percentile: 20.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

20.8% percentile inside its fair comparison set

8.5%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #30 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 69.8%
Percentile: 90.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `ifbench`.

90.8% percentile inside its fair comparison set

69.8%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.4 /1M tokens
Percentile: 62%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

62% percentile inside its fair comparison set

$0.4 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #114 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.3 /1M input tokens
Percentile: 59.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

59.8% percentile inside its fair comparison set

$0.3 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #103 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.8 /1M output tokens
Percentile: 63.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

63.8% percentile inside its fair comparison set

$0.8 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #1 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 978.6 tokens/s
Percentile: 100%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

100% percentile inside its fair comparison set

978.6 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #146 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 3.17s
Percentile: 31%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

31% percentile inside its fair comparison set

3.17sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #78 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 3.17s
Percentile: 63.3%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

63.3% percentile inside its fair comparison set

3.17sRaw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #150

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,346
Percentile: 54.2%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #179. Votes: 3124. Organization: inception-ai. License: Proprietary.

54.2% percentile inside its fair comparison set

1,346Raw benchmark valueCI 1,336 - 1,357

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #163

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,300
Percentile: 49.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: creative_writing. Source rank: #196. Votes: 528. Organization: inception-ai. License: Proprietary.

49.8% percentile inside its fair comparison set

1,300Raw benchmark valueCI 1,273 - 1,326

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #151

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 53.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: english. Source rank: #179. Votes: 1397. Organization: inception-ai. License: Proprietary.

53.8% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,349 - 1,380

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #155

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,299
Percentile: 52.6%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: exclude_ties. Source rank: #185. Votes: 2172. Organization: inception-ai. License: Proprietary.

52.6% percentile inside its fair comparison set

1,299Raw benchmark valueCI 1,283 - 1,315

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #151

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,359
Percentile: 53.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts. Source rank: #181. Votes: 1761. Organization: inception-ai. License: Proprietary.

53.8% percentile inside its fair comparison set

1,359Raw benchmark valueCI 1,345 - 1,373

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #151

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,371
Percentile: 53.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts_english. Source rank: #180. Votes: 866. Organization: inception-ai. License: Proprietary.

53.7% percentile inside its fair comparison set

1,371Raw benchmark valueCI 1,351 - 1,391

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #155

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,322
Percentile: 52.6%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: instruction_following. Source rank: #187. Votes: 838. Organization: inception-ai. License: Proprietary.

52.6% percentile inside its fair comparison set

1,322Raw benchmark valueCI 1,302 - 1,342

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #181

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 40.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: longer_query. Source rank: #215. Votes: 843. Organization: inception-ai. License: Proprietary.

40.8% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,304 - 1,345

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #145

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,344
Percentile: 55.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: multi_turn. Source rank: #173. Votes: 540. Organization: inception-ai. License: Proprietary.

55.4% percentile inside its fair comparison set

1,344Raw benchmark valueCI 1,318 - 1,369

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #128

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,357
Percentile: 60.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #154. Votes: 3124. Organization: inception-ai. License: Proprietary.

60.9% percentile inside its fair comparison set

1,357Raw benchmark valueCI 1,346 - 1,367

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #150

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,296
Percentile: 53.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: creative_writing. Source rank: #178. Votes: 528. Organization: inception-ai. License: Proprietary.

53.9% percentile inside its fair comparison set

1,296Raw benchmark valueCI 1,270 - 1,321

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,378
Percentile: 62.2%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: english. Source rank: #147. Votes: 1397. Organization: inception-ai. License: Proprietary.

62.2% percentile inside its fair comparison set

1,378Raw benchmark valueCI 1,362 - 1,393

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #133

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,314
Percentile: 59.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: exclude_ties. Source rank: #159. Votes: 2172. Organization: inception-ai. License: Proprietary.

59.4% percentile inside its fair comparison set

1,314Raw benchmark valueCI 1,299 - 1,330

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #128

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,361
Percentile: 60.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts. Source rank: #154. Votes: 1761. Organization: inception-ai. License: Proprietary.

60.9% percentile inside its fair comparison set

1,361Raw benchmark valueCI 1,348 - 1,375

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 62.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: hard_prompts_english. Source rank: #146. Votes: 866. Organization: inception-ai. License: Proprietary.

62.7% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,357 - 1,396

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #136

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,322
Percentile: 58.5%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: instruction_following. Source rank: #164. Votes: 838. Organization: inception-ai. License: Proprietary.

58.5% percentile inside its fair comparison set

1,322Raw benchmark valueCI 1,302 - 1,342

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #142

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,328
Percentile: 53.6%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: longer_query. Source rank: #170. Votes: 843. Organization: inception-ai. License: Proprietary.

53.6% percentile inside its fair comparison set

1,328Raw benchmark valueCI 1,308 - 1,348

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #124

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,357
Percentile: 61.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: multi_turn. Source rank: #150. Votes: 540. Organization: inception-ai. License: Proprietary.

61.9% percentile inside its fair comparison set

1,357Raw benchmark valueCI 1,332 - 1,382

Coding8 benchmarks36.1%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #78 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 26.5%
Percentile: 74.8%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

74.8% percentile inside its fair comparison set

26.5%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #79 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 38.7%
Percentile: 79.3%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `scicode`.

79.3% percentile inside its fair comparison set

38.7%Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #71

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,165
Percentile: 4.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: overall. Source rank: #87. Votes: 948. Organization: inception-ai. License: Proprietary.

4.1% percentile inside its fair comparison set

1,165Raw benchmark valueCI 1,142 - 1,188

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #71

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,165
Percentile: 4.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: webdev. Source rank: #87. Votes: 948. Organization: inception-ai. License: Proprietary.

4.1% percentile inside its fair comparison set

1,165Raw benchmark valueCI 1,142 - 1,188

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #71

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,207
Percentile: 4.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: webdev-html. Source rank: #86. Votes: 100. Organization: inception-ai. License: Proprietary.

4.1% percentile inside its fair comparison set

1,207Raw benchmark valueCI 1,137 - 1,277

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #60

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,154
Percentile: 0%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: webdev-react. Source rank: #74. Votes: 847. Organization: inception-ai. License: Proprietary.

0% percentile inside its fair comparison set

1,154Raw benchmark valueCI 1,130 - 1,178

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #141

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,397
Percentile: 56.3%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: coding. Source rank: #169. Votes: 769. Organization: inception-ai. License: Proprietary.

56.3% percentile inside its fair comparison set

1,397Raw benchmark valueCI 1,376 - 1,418

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #110

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,393
Percentile: 65.9%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: coding. Source rank: #134. Votes: 769. Organization: inception-ai. License: Proprietary.

65.9% percentile inside its fair comparison set

1,393Raw benchmark valueCI 1,372 - 1,413

Reasoning / math / science3 benchmarks81%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #54 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 15.5%
Percentile: 85.7%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `hle`.

85.7% percentile inside its fair comparison set

15.5%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #87 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 77%
Percentile: 77%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `gpqa`.

77% percentile inside its fair comparison set

77%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #60 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0.8%
Percentile: 80.5%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `critpt`.

80.5% percentile inside its fair comparison set

0.8%Raw benchmark value

Professional reasoning14 benchmarks53.9%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #144

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,349
Percentile: 48%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: expert. Source rank: #174. Votes: 229. Organization: inception-ai. License: Proprietary.

48% percentile inside its fair comparison set

1,349Raw benchmark valueCI 1,311 - 1,387

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #157

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,339
Percentile: 50.9%
Last updated: recent
Eligibility: preview_model

50.9% percentile inside its fair comparison set

1,339Raw benchmark valueCI 1,314 - 1,364

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #145

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,311
Percentile: 55.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_entertainment_and_sports_and_media. Source rank: #175. Votes: 658. Organization: inception-ai. License: Proprietary.

55.4% percentile inside its fair comparison set

1,311Raw benchmark valueCI 1,288 - 1,334

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #194

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,322
Percentile: 40.2%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_life_and_physical_and_social_science. Source rank: #228. Votes: 471. Organization: inception-ai. License: Proprietary.

40.2% percentile inside its fair comparison set

1,322Raw benchmark valueCI 1,295 - 1,349

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #161

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,344
Percentile: 45.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_medicine_and_healthcare. Source rank: #192. Votes: 228. Organization: inception-ai. License: Proprietary.

45.8% percentile inside its fair comparison set

1,344Raw benchmark valueCI 1,306 - 1,382

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #147

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,384
Percentile: 55.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_software_and_it_services. Source rank: #175. Votes: 1089. Organization: inception-ai. License: Proprietary.

55.1% percentile inside its fair comparison set

1,384Raw benchmark valueCI 1,366 - 1,402

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #165

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,310
Percentile: 49.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_writing_and_literature_and_language. Source rank: #198. Votes: 695. Organization: inception-ai. License: Proprietary.

49.4% percentile inside its fair comparison set

1,310Raw benchmark valueCI 1,287 - 1,332

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #121

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,356
Percentile: 56.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: expert. Source rank: #147. Votes: 229. Organization: inception-ai. License: Proprietary.

56.4% percentile inside its fair comparison set

1,356Raw benchmark valueCI 1,319 - 1,393

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #122

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,350
Percentile: 61.9%
Last updated: recent
Eligibility: preview_model

61.9% percentile inside its fair comparison set

1,350Raw benchmark valueCI 1,326 - 1,375

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #128

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,317
Percentile: 60.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_entertainment_and_sports_and_media. Source rank: #155. Votes: 658. Organization: inception-ai. License: Proprietary.

60.7% percentile inside its fair comparison set

1,317Raw benchmark valueCI 1,294 - 1,339

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #147

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,345
Percentile: 54.8%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_life_and_physical_and_social_science. Source rank: #173. Votes: 471. Organization: inception-ai. License: Proprietary.

54.8% percentile inside its fair comparison set

1,345Raw benchmark valueCI 1,318 - 1,372

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #120

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,369
Percentile: 59.7%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_medicine_and_healthcare. Source rank: #144. Votes: 228. Organization: inception-ai. License: Proprietary.

59.7% percentile inside its fair comparison set

1,369Raw benchmark valueCI 1,332 - 1,406

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #120

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,385
Percentile: 63.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_software_and_it_services. Source rank: #143. Votes: 1089. Organization: inception-ai. License: Proprietary.

63.4% percentile inside its fair comparison set

1,385Raw benchmark valueCI 1,368 - 1,403

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #152

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,303
Percentile: 53.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: industry_writing_and_literature_and_language. Source rank: #180. Votes: 695. Organization: inception-ai. License: Proprietary.

53.4% percentile inside its fair comparison set

1,303Raw benchmark valueCI 1,281 - 1,326

Search / tool use1 benchmark69.9%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #94 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 70.8%
Percentile: 69.9%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `tau2`.

69.9% percentile inside its fair comparison set

70.8%Raw benchmark value

Long context1 benchmark59.7%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #129 · Source label: Mercury 2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 36.3%
Percentile: 59.7%
Last updated: recent
Eligibility: preview_model

Parsed from Artificial Analysis public leaderboard field `lcr`.

59.7% percentile inside its fair comparison set

36.3%Raw benchmark value

Multilingual2 benchmarks43.8%

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #174

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,306
Percentile: 40.1%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: russian. Source rank: #208. Votes: 360. Organization: inception-ai. License: Proprietary.

40.1% percentile inside its fair comparison set

1,306Raw benchmark valueCI 1,275 - 1,336

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #153

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,303
Percentile: 47.4%
Last updated: recent
Eligibility: preview_model

Parsed from Arena leaderboard dataset row `mercury-2`. Category: russian. Source rank: #181. Votes: 360. Organization: inception-ai. License: Proprietary.

47.4% percentile inside its fair comparison set

1,303Raw benchmark valueCI 1,273 - 1,334