Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #56 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 83.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: overall. Source rank: #72. Votes: 8177. Organization: moonshot. License: Modified MIT.
83.1% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,425 - 1,438
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #76 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,390
- Percentile
- 76.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: creative_writing. Source rank: #93. Votes: 1271. Organization: moonshot. License: Modified MIT.
76.8% percentile inside its fair comparison set1,390Raw benchmark valueCI 1,373 - 1,406
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: english. Source rank: #65. Votes: 3712. Organization: moonshot. License: Modified MIT.
84% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,435 - 1,454
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,426
- Percentile
- 83.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: exclude_ties. Source rank: #71. Votes: 5644. Organization: moonshot. License: Modified MIT.
83.4% percentile inside its fair comparison set1,426Raw benchmark valueCI 1,416 - 1,435
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #46 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 86.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: hard_prompts. Source rank: #58. Votes: 4535. Organization: moonshot. License: Modified MIT.
86.2% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,452 - 1,469
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #48 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 85.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: hard_prompts_english. Source rank: #60. Votes: 2147. Organization: moonshot. License: Modified MIT.
85.5% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,452 - 1,477
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,435
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: instruction_following. Source rank: #52. Votes: 2232. Organization: moonshot. License: Modified MIT.
87.7% percentile inside its fair comparison set1,435Raw benchmark valueCI 1,422 - 1,447
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #51 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 83.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: longer_query. Source rank: #64. Votes: 2211. Organization: moonshot. License: Modified MIT.
83.6% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,433 - 1,458
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,439
- Percentile
- 83.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: multi_turn. Source rank: #70. Votes: 1467. Organization: moonshot. License: Modified MIT.
83.9% percentile inside its fair comparison set1,439Raw benchmark valueCI 1,424 - 1,454
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #62 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 81.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: overall. Source rank: #75. Votes: 8177. Organization: moonshot. License: Modified MIT.
81.2% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,414 - 1,427
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #77 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,380
- Percentile
- 76.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: creative_writing. Source rank: #93. Votes: 1271. Organization: moonshot. License: Modified MIT.
76.5% percentile inside its fair comparison set1,380Raw benchmark valueCI 1,363 - 1,396
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,433
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: english. Source rank: #73. Votes: 3712. Organization: moonshot. License: Modified MIT.
81.5% percentile inside its fair comparison set1,433Raw benchmark valueCI 1,423 - 1,442
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,409
- Percentile
- 80.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: exclude_ties. Source rank: #78. Votes: 5644. Organization: moonshot. License: Modified MIT.
80.3% percentile inside its fair comparison set1,409Raw benchmark valueCI 1,400 - 1,418
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #42 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,444
- Percentile
- 87.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: hard_prompts. Source rank: #50. Votes: 4535. Organization: moonshot. License: Modified MIT.
87.4% percentile inside its fair comparison set1,444Raw benchmark valueCI 1,435 - 1,452
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #42 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 87.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: hard_prompts_english. Source rank: #50. Votes: 2147. Organization: moonshot. License: Modified MIT.
87.3% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,437 - 1,462
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #31 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 90.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: instruction_following. Source rank: #39. Votes: 2232. Organization: moonshot. License: Modified MIT.
90.8% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,419 - 1,443
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #39 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 87.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: longer_query. Source rank: #47. Votes: 2211. Organization: moonshot. License: Modified MIT.
87.5% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,426 - 1,450
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #58 · Source label: kimi-k2.5-instant
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,426
- Percentile
- 82.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `kimi-k2.5-instant`. Category: multi_turn. Source rank: #72. Votes: 1467. Organization: moonshot. License: Modified MIT.
82.4% percentile inside its fair comparison set1,426Raw benchmark valueCI 1,411 - 1,441