Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #60
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: overall. Source rank: #76. Votes: 3417. Organization: amazon. License: Proprietary.
81.8% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,418 - 1,437
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #121
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,343
- Percentile
- 62.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: creative_writing. Source rank: #146. Votes: 457. Organization: amazon. License: Proprietary.
62.8% percentile inside its fair comparison set1,343Raw benchmark valueCI 1,316 - 1,369
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #60
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 81.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: english. Source rank: #74. Votes: 1560. Organization: amazon. License: Proprietary.
81.8% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,425 - 1,454
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,419
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: exclude_ties. Source rank: #77. Votes: 2360. Organization: amazon. License: Proprietary.
81.5% percentile inside its fair comparison set1,419Raw benchmark valueCI 1,405 - 1,434
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #57
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 82.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: hard_prompts. Source rank: #72. Votes: 1935. Organization: amazon. License: Proprietary.
82.8% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,436 - 1,462
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #64
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 80.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: hard_prompts_english. Source rank: #80. Votes: 929. Organization: amazon. License: Proprietary.
80.6% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,435 - 1,473
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #61
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: instruction_following. Source rank: #76. Votes: 935. Organization: amazon. License: Proprietary.
81.5% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,399 - 1,436
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #72
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 76.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: longer_query. Source rank: #90. Votes: 946. Organization: amazon. License: Proprietary.
76.6% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,410 - 1,447
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #60
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,430
- Percentile
- 81.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: multi_turn. Source rank: #77. Votes: 602. Organization: amazon. License: Proprietary.
81.7% percentile inside its fair comparison set1,430Raw benchmark valueCI 1,407 - 1,454
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #23
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 93.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: overall. Source rank: #29. Votes: 3417. Organization: amazon. License: Proprietary.
93.2% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,439 - 1,458
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #86
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,366
- Percentile
- 73.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: creative_writing. Source rank: #106. Votes: 457. Organization: amazon. License: Proprietary.
73.7% percentile inside its fair comparison set1,366Raw benchmark valueCI 1,340 - 1,392
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #23
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 93.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: english. Source rank: #28. Votes: 1560. Organization: amazon. License: Proprietary.
93.2% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,442 - 1,471
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: exclude_ties. Source rank: #27. Votes: 2360. Organization: amazon. License: Proprietary.
93.5% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,434 - 1,463
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,458
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: hard_prompts. Source rank: #29. Votes: 1935. Organization: amazon. License: Proprietary.
93.5% percentile inside its fair comparison set1,458Raw benchmark valueCI 1,445 - 1,471
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #25
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 92.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: hard_prompts_english. Source rank: #32. Votes: 929. Organization: amazon. License: Proprietary.
92.6% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,441 - 1,478
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #35
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,428
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: instruction_following. Source rank: #43. Votes: 935. Organization: amazon. License: Proprietary.
89.5% percentile inside its fair comparison set1,428Raw benchmark valueCI 1,409 - 1,446
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #40
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,437
- Percentile
- 87.2%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: longer_query. Source rank: #48. Votes: 946. Organization: amazon. License: Proprietary.
87.2% percentile inside its fair comparison set1,437Raw benchmark valueCI 1,419 - 1,455
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #38
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 88.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `amazon-nova-experimental-chat-26-02-10`. Category: multi_turn. Source rank: #48. Votes: 602. Organization: amazon. License: Proprietary.
88.5% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,421 - 1,468