Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #20 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,468
- Percentile
- 94.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: overall. Source rank: #28. Votes: 25064. Organization: baidu. License: Proprietary.
94.2% percentile inside its fair comparison set1,468Raw benchmark valueCI 1,463 - 1,473
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #29 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,435
- Percentile
- 91.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: creative_writing. Source rank: #40. Votes: 3909. Organization: baidu. License: Proprietary.
91.3% percentile inside its fair comparison set1,435Raw benchmark valueCI 1,425 - 1,445
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #18 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,479
- Percentile
- 94.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: english. Source rank: #20. Votes: 11760. Organization: baidu. License: Proprietary.
94.8% percentile inside its fair comparison set1,479Raw benchmark valueCI 1,473 - 1,486
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #20 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,473
- Percentile
- 94.2%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: exclude_ties. Source rank: #28. Votes: 19100. Organization: baidu. License: Proprietary.
94.2% percentile inside its fair comparison set1,473Raw benchmark valueCI 1,467 - 1,480
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,488
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: hard_prompts. Source rank: #28. Votes: 16137. Organization: baidu. License: Proprietary.
93.5% percentile inside its fair comparison set1,488Raw benchmark valueCI 1,482 - 1,494
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #18 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,493
- Percentile
- 94.8%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: hard_prompts_english. Source rank: #21. Votes: 8071. Organization: baidu. License: Proprietary.
94.8% percentile inside its fair comparison set1,493Raw benchmark valueCI 1,486 - 1,501
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #24 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: instruction_following. Source rank: #32. Votes: 7816. Organization: baidu. License: Proprietary.
92.9% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,447 - 1,462
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #30 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 90.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: longer_query. Source rank: #38. Votes: 10110. Organization: baidu. License: Proprietary.
90.5% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,458 - 1,473
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,473
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: multi_turn. Source rank: #31. Votes: 4242. Organization: baidu. License: Proprietary.
93.5% percentile inside its fair comparison set1,473Raw benchmark valueCI 1,463 - 1,483
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #12 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 96.6%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: overall. Source rank: #15. Votes: 25064. Organization: baidu. License: Proprietary.
96.6% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,462 - 1,472
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #15 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: creative_writing. Source rank: #19. Votes: 3909. Organization: baidu. License: Proprietary.
95.7% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,436 - 1,457
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #9 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: english. Source rank: #11. Votes: 11760. Organization: baidu. License: Proprietary.
97.5% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,471 - 1,484
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #14 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,471
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: exclude_ties. Source rank: #17. Votes: 19100. Organization: baidu. License: Proprietary.
96% percentile inside its fair comparison set1,471Raw benchmark valueCI 1,464 - 1,477
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #13 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,480
- Percentile
- 96.3%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: hard_prompts. Source rank: #16. Votes: 16137. Organization: baidu. License: Proprietary.
96.3% percentile inside its fair comparison set1,480Raw benchmark valueCI 1,474 - 1,486
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #7 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,485
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: hard_prompts_english. Source rank: #9. Votes: 8071. Organization: baidu. License: Proprietary.
98.1% percentile inside its fair comparison set1,485Raw benchmark valueCI 1,478 - 1,493
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #14 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: instruction_following. Source rank: #19. Votes: 7816. Organization: baidu. License: Proprietary.
96% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,453 - 1,468
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #19 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 94.1%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: longer_query. Source rank: #26. Votes: 10110. Organization: baidu. License: Proprietary.
94.1% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,454 - 1,468
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #14 · Source label: ernie-5.1
verified runtimeexact alias
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,472
- Percentile
- 96%
- Last updated
- recent
- Eligibility
- headline eligible
Parsed from Arena leaderboard dataset row `ernie-5.1`. Category: multi_turn. Source rank: #17. Votes: 4242. Organization: baidu. License: Proprietary.
96% percentile inside its fair comparison set1,472Raw benchmark valueCI 1,462 - 1,482