Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #34 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 89.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: overall. Source rank: #45. Votes: 9748. Organization: baidu. License: Proprietary.
89.8% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,443 - 1,456
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 87.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: creative_writing. Source rank: #53. Votes: 1554. Organization: baidu. License: Proprietary.
87.6% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,405 - 1,435
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #43 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,451
- Percentile
- 87.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: english. Source rank: #55. Votes: 4589. Organization: baidu. License: Proprietary.
87.1% percentile inside its fair comparison set1,451Raw benchmark valueCI 1,442 - 1,460
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #35 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,451
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: exclude_ties. Source rank: #46. Votes: 6904. Organization: baidu. License: Proprietary.
89.5% percentile inside its fair comparison set1,451Raw benchmark valueCI 1,441 - 1,460
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #42 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,464
- Percentile
- 87.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: hard_prompts. Source rank: #54. Votes: 5205. Organization: baidu. License: Proprietary.
87.4% percentile inside its fair comparison set1,464Raw benchmark valueCI 1,455 - 1,472
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 84.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: hard_prompts_english. Source rank: #64. Votes: 2523. Organization: baidu. License: Proprietary.
84.3% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,449 - 1,472
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 82.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: instruction_following. Source rank: #72. Votes: 2624. Organization: baidu. License: Proprietary.
82.8% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,409 - 1,431
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,435
- Percentile
- 78.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: longer_query. Source rank: #81. Votes: 2455. Organization: baidu. License: Proprietary.
78.9% percentile inside its fair comparison set1,435Raw benchmark valueCI 1,423 - 1,447
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 80.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: multi_turn. Source rank: #83. Votes: 1543. Organization: baidu. License: Proprietary.
80.2% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,413 - 1,443
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #31 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 90.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: overall. Source rank: #40. Votes: 9748. Organization: baidu. License: Proprietary.
90.8% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,436 - 1,449
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #31 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,422
- Percentile
- 90.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: creative_writing. Source rank: #41. Votes: 1554. Organization: baidu. License: Proprietary.
90.7% percentile inside its fair comparison set1,422Raw benchmark valueCI 1,407 - 1,437
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: english. Source rank: #61. Votes: 4589. Organization: baidu. License: Proprietary.
85.2% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,432 - 1,449
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #31 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 90.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: exclude_ties. Source rank: #39. Votes: 6904. Organization: baidu. License: Proprietary.
90.8% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,431 - 1,449
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #49 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: hard_prompts. Source rank: #61. Votes: 5205. Organization: baidu. License: Proprietary.
85.2% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,429 - 1,446
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #70 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 78.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: hard_prompts_english. Source rank: #85. Votes: 2523. Organization: baidu. License: Proprietary.
78.7% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,420 - 1,443
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #69 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 79.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: instruction_following. Source rank: #84. Votes: 2624. Organization: baidu. License: Proprietary.
79.1% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,391 - 1,413
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #75 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,410
- Percentile
- 75.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: longer_query. Source rank: #91. Votes: 2455. Organization: baidu. License: Proprietary.
75.7% percentile inside its fair comparison set1,410Raw benchmark valueCI 1,398 - 1,421
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #77 · Source label: ernie-5.0-preview-1203
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,414
- Percentile
- 76.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `ernie-5.0-preview-1203`. Category: multi_turn. Source rank: #93. Votes: 1543. Organization: baidu. License: Proprietary.
76.5% percentile inside its fair comparison set1,414Raw benchmark valueCI 1,399 - 1,429