Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #50
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 84.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: overall. Source rank: #64. Votes: 28187. Organization: meituan. License: Proprietary.
84.9% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,431 - 1,440
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #74
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,391
- Percentile
- 77.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: creative_writing. Source rank: #90. Votes: 4233. Organization: meituan. License: Proprietary.
77.4% percentile inside its fair comparison set1,391Raw benchmark valueCI 1,381 - 1,401
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #36
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 89.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: english. Source rank: #47. Votes: 13558. Organization: meituan. License: Proprietary.
89.2% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,453 - 1,466
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #51
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 84.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: exclude_ties. Source rank: #66. Votes: 21246. Organization: meituan. License: Proprietary.
84.6% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,425 - 1,437
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #47
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: hard_prompts. Source rank: #60. Votes: 17863. Organization: meituan. License: Proprietary.
85.8% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,454 - 1,465
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #37
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,475
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: hard_prompts_english. Source rank: #48. Votes: 8989. Organization: meituan. License: Proprietary.
88.9% percentile inside its fair comparison set1,475Raw benchmark valueCI 1,468 - 1,482
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #68
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 79.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: instruction_following. Source rank: #85. Votes: 9095. Organization: meituan. License: Proprietary.
79.4% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,408 - 1,422
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #62
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 79.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: longer_query. Source rank: #77. Votes: 10712. Organization: meituan. License: Proprietary.
79.9% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,429 - 1,443
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #58
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,432
- Percentile
- 82.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: multi_turn. Source rank: #75. Votes: 5013. Organization: meituan. License: Proprietary.
82.4% percentile inside its fair comparison set1,432Raw benchmark valueCI 1,423 - 1,441
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,425
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: overall. Source rank: #65. Votes: 28187. Organization: meituan. License: Proprietary.
84% percentile inside its fair comparison set1,425Raw benchmark valueCI 1,421 - 1,430
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #73
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,388
- Percentile
- 77.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: creative_writing. Source rank: #86. Votes: 4233. Organization: meituan. License: Proprietary.
77.7% percentile inside its fair comparison set1,388Raw benchmark valueCI 1,379 - 1,398
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #32
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,451
- Percentile
- 90.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: english. Source rank: #39. Votes: 13558. Organization: meituan. License: Proprietary.
90.5% percentile inside its fair comparison set1,451Raw benchmark valueCI 1,445 - 1,457
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #53
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 84%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: exclude_ties. Source rank: #65. Votes: 21246. Organization: meituan. License: Proprietary.
84% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,409 - 1,421
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #48
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 85.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: hard_prompts. Source rank: #60. Votes: 17863. Organization: meituan. License: Proprietary.
85.5% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,432 - 1,444
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #28
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,458
- Percentile
- 91.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: hard_prompts_english. Source rank: #35. Votes: 8989. Organization: meituan. License: Proprietary.
91.7% percentile inside its fair comparison set1,458Raw benchmark valueCI 1,451 - 1,466
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,409
- Percentile
- 83.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: instruction_following. Source rank: #69. Votes: 9095. Organization: meituan. License: Proprietary.
83.4% percentile inside its fair comparison set1,409Raw benchmark valueCI 1,402 - 1,415
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #51
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 83.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: longer_query. Source rank: #64. Votes: 10712. Organization: meituan. License: Proprietary.
83.6% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,417 - 1,431
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #69
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 78.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `longcat-flash-chat-2602-exp`. Category: multi_turn. Source rank: #85. Votes: 5013. Organization: meituan. License: Proprietary.
78.9% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,408 - 1,426