Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #40 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: overall. Source rank: #52. Votes: 35299. Organization: baidu. License: Proprietary.
88% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,443 - 1,451
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #38 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,423
- Percentile
- 88.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: creative_writing. Source rank: #50. Votes: 5547. Organization: baidu. License: Proprietary.
88.5% percentile inside its fair comparison set1,423Raw benchmark valueCI 1,415 - 1,432
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #44 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,451
- Percentile
- 86.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: english. Source rank: #56. Votes: 15895. Organization: baidu. License: Proprietary.
86.8% percentile inside its fair comparison set1,451Raw benchmark valueCI 1,446 - 1,457
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: exclude_ties. Source rank: #52. Votes: 25141. Organization: baidu. License: Proprietary.
87.7% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,442 - 1,453
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,465
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: hard_prompts. Source rank: #53. Votes: 20884. Organization: baidu. License: Proprietary.
87.7% percentile inside its fair comparison set1,465Raw benchmark valueCI 1,460 - 1,470
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #47 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: hard_prompts_english. Source rank: #59. Votes: 9750. Organization: baidu. License: Proprietary.
85.8% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,460 - 1,473
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #51 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 84.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: instruction_following. Source rank: #65. Votes: 10514. Organization: baidu. License: Proprietary.
84.6% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,421 - 1,433
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #59 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 80.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: longer_query. Source rank: #73. Votes: 11113. Organization: baidu. License: Proprietary.
80.9% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,434 - 1,447
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #52 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,442
- Percentile
- 84.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: multi_turn. Source rank: #69. Votes: 5704. Organization: baidu. License: Proprietary.
84.2% percentile inside its fair comparison set1,442Raw benchmark valueCI 1,434 - 1,450
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #30 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 91.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: overall. Source rank: #37. Votes: 35299. Organization: baidu. License: Proprietary.
91.1% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,441 - 1,449
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #28 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 91.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: creative_writing. Source rank: #37. Votes: 5547. Organization: baidu. License: Proprietary.
91.6% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,420 - 1,436
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: english. Source rank: #51. Votes: 15895. Organization: baidu. License: Proprietary.
87.7% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,441 - 1,452
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #26 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,444
- Percentile
- 92.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: exclude_ties. Source rank: #33. Votes: 25141. Organization: baidu. License: Proprietary.
92.3% percentile inside its fair comparison set1,444Raw benchmark valueCI 1,438 - 1,449
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #38 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 88.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: hard_prompts. Source rank: #46. Votes: 20884. Organization: baidu. License: Proprietary.
88.6% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,441 - 1,451
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #46 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 86.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: hard_prompts_english. Source rank: #55. Votes: 9750. Organization: baidu. License: Proprietary.
86.1% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,440 - 1,453
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #44 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,415
- Percentile
- 86.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: instruction_following. Source rank: #55. Votes: 10514. Organization: baidu. License: Proprietary.
86.8% percentile inside its fair comparison set1,415Raw benchmark valueCI 1,409 - 1,421
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #55 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,422
- Percentile
- 82.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: longer_query. Source rank: #68. Votes: 11113. Organization: baidu. License: Proprietary.
82.2% percentile inside its fair comparison set1,422Raw benchmark valueCI 1,416 - 1,429
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #50 · Source label: ernie-5.0-0110
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,433
- Percentile
- 84.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `ernie-5.0-0110`. Category: multi_turn. Source rank: #62. Votes: 5704. Organization: baidu. License: Proprietary.
84.8% percentile inside its fair comparison set1,433Raw benchmark valueCI 1,425 - 1,442