Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #84
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,413
- Percentile
- 74.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: overall. Source rank: #103. Votes: 6678. Organization: tencent. License: tencent-hunyuan-community.
74.5% percentile inside its fair comparison set1,413Raw benchmark valueCI 1,406 - 1,421
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #105
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,358
- Percentile
- 67.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: creative_writing. Source rank: #130. Votes: 1019. Organization: tencent. License: tencent-hunyuan-community.
67.8% percentile inside its fair comparison set1,358Raw benchmark valueCI 1,339 - 1,378
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #89
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 72.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: english. Source rank: #109. Votes: 3234. Organization: tencent. License: tencent-hunyuan-community.
72.9% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,410 - 1,431
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #84
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,401
- Percentile
- 74.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: exclude_ties. Source rank: #103. Votes: 5028. Organization: tencent. License: tencent-hunyuan-community.
74.5% percentile inside its fair comparison set1,401Raw benchmark valueCI 1,391 - 1,411
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #74
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 77.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: hard_prompts. Source rank: #92. Votes: 4422. Organization: tencent. License: tencent-hunyuan-community.
77.5% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,429 - 1,448
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #85
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 74.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: hard_prompts_english. Source rank: #104. Votes: 2227. Organization: tencent. License: tencent-hunyuan-community.
74.1% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,425 - 1,450
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #88
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,399
- Percentile
- 73.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: instruction_following. Source rank: #109. Votes: 2218. Organization: tencent. License: tencent-hunyuan-community.
73.2% percentile inside its fair comparison set1,399Raw benchmark valueCI 1,386 - 1,412
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #74
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 76%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: longer_query. Source rank: #92. Votes: 2906. Organization: tencent. License: tencent-hunyuan-community.
76% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,415 - 1,438
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #81
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 75.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: multi_turn. Source rank: #100. Votes: 1177. Organization: tencent. License: tencent-hunyuan-community.
75.2% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,397 - 1,433
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #90
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,405
- Percentile
- 72.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: overall. Source rank: #108. Votes: 6678. Organization: tencent. License: tencent-hunyuan-community.
72.6% percentile inside its fair comparison set1,405Raw benchmark valueCI 1,397 - 1,412
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #107
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,345
- Percentile
- 67.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: creative_writing. Source rank: #131. Votes: 1019. Organization: tencent. License: tencent-hunyuan-community.
67.2% percentile inside its fair comparison set1,345Raw benchmark valueCI 1,326 - 1,364
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #95
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,411
- Percentile
- 71.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: english. Source rank: #113. Votes: 3234. Organization: tencent. License: tencent-hunyuan-community.
71.1% percentile inside its fair comparison set1,411Raw benchmark valueCI 1,400 - 1,421
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #89
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,388
- Percentile
- 72.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: exclude_ties. Source rank: #107. Votes: 5028. Organization: tencent. License: tencent-hunyuan-community.
72.9% percentile inside its fair comparison set1,388Raw benchmark valueCI 1,378 - 1,398
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #79
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,418
- Percentile
- 76%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: hard_prompts. Source rank: #96. Votes: 4422. Organization: tencent. License: tencent-hunyuan-community.
76% percentile inside its fair comparison set1,418Raw benchmark valueCI 1,409 - 1,427
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #86
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 73.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: hard_prompts_english. Source rank: #103. Votes: 2227. Organization: tencent. License: tencent-hunyuan-community.
73.8% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,406 - 1,431
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #89
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,386
- Percentile
- 72.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: instruction_following. Source rank: #107. Votes: 2218. Organization: tencent. License: tencent-hunyuan-community.
72.9% percentile inside its fair comparison set1,386Raw benchmark valueCI 1,373 - 1,399
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #79
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,407
- Percentile
- 74.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: longer_query. Source rank: #97. Votes: 2906. Organization: tencent. License: tencent-hunyuan-community.
74.3% percentile inside its fair comparison set1,407Raw benchmark valueCI 1,396 - 1,419
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #80
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,410
- Percentile
- 75.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `hunyuan-hy3-preview`. Category: multi_turn. Source rank: #97. Votes: 1177. Organization: tencent. License: tencent-hunyuan-community.
75.5% percentile inside its fair comparison set1,410Raw benchmark valueCI 1,392 - 1,428