Text Arena
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #22 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,466
- Percentile
- 93.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: overall. Source rank: #30. Votes: 65623. Organization: xai. License: Proprietary.
93.5% percentile inside its fair comparison set1,466Raw benchmark valueCI 1,463 - 1,469
Text Arena · Creative Writing
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #31 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,432
- Percentile
- 90.7%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: creative_writing. Source rank: #42. Votes: 9811. Organization: xai. License: Proprietary.
90.7% percentile inside its fair comparison set1,432Raw benchmark valueCI 1,425 - 1,439
Text Arena · English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #24 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,471
- Percentile
- 92.9%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: english. Source rank: #32. Votes: 30490. Organization: xai. License: Proprietary.
92.9% percentile inside its fair comparison set1,471Raw benchmark valueCI 1,467 - 1,476
Text Arena · Exclude Ties
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #21 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,473
- Percentile
- 93.8%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: exclude_ties. Source rank: #29. Votes: 48444. Organization: xai. License: Proprietary.
93.8% percentile inside its fair comparison set1,473Raw benchmark valueCI 1,469 - 1,478
Text Arena · Hard Prompts
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #30 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,477
- Percentile
- 91.1%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: hard_prompts. Source rank: #40. Votes: 37496. Organization: xai. License: Proprietary.
91.1% percentile inside its fair comparison set1,477Raw benchmark valueCI 1,473 - 1,481
Text Arena · Hard Prompts English
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #35 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,478
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: hard_prompts_english. Source rank: #45. Votes: 18278. Organization: xai. License: Proprietary.
89.5% percentile inside its fair comparison set1,478Raw benchmark valueCI 1,473 - 1,484
Text Arena · Instruction Following
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #47 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: instruction_following. Source rank: #58. Votes: 18732. Organization: xai. License: Proprietary.
85.8% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,426 - 1,436
Text Arena · Longer Query
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #43 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 86.2%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: longer_query. Source rank: #54. Votes: 20041. Organization: xai. License: Proprietary.
86.2% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,445 - 1,455
Text Arena · Multi Turn
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #35 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,462
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: multi_turn. Source rank: #46. Votes: 12345. Organization: xai. License: Proprietary.
89.5% percentile inside its fair comparison set1,462Raw benchmark valueCI 1,456 - 1,469
Text Arena · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,437
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: overall. Source rank: #50. Votes: 65623. Organization: xai. License: Proprietary.
87.7% percentile inside its fair comparison set1,437Raw benchmark valueCI 1,434 - 1,441
Text Arena · Creative Writing · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #38 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,412
- Percentile
- 88.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: creative_writing. Source rank: #48. Votes: 9811. Organization: xai. License: Proprietary.
88.5% percentile inside its fair comparison set1,412Raw benchmark valueCI 1,406 - 1,419
Text Arena · English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #42 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 87.4%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: english. Source rank: #52. Votes: 31712. Organization: xai. License: Proprietary.
87.4% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,442 - 1,451
Text Arena · Exclude Ties · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #41 · Source label: grok-4.1-thinking
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,432
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1-thinking`. Category: exclude_ties. Source rank: #50. Votes: 48444. Organization: xai. License: Proprietary.
87.7% percentile inside its fair comparison set1,432Raw benchmark valueCI 1,428 - 1,436
Text Arena · Hard Prompts · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #50 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 84.9%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: hard_prompts. Source rank: #62. Votes: 38440. Organization: xai. License: Proprietary.
84.9% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,432 - 1,440
Text Arena · Hard Prompts English · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #58 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 82.4%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: hard_prompts_english. Source rank: #69. Votes: 18839. Organization: xai. License: Proprietary.
82.4% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,434 - 1,445
Text Arena · Instruction Following · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #72 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,400
- Percentile
- 78.2%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: instruction_following. Source rank: #87. Votes: 18732. Organization: xai. License: Proprietary.
78.2% percentile inside its fair comparison set1,400Raw benchmark valueCI 1,395 - 1,405
Text Arena · Longer Query · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #65 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,416
- Percentile
- 78.9%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: longer_query. Source rank: #78. Votes: 20041. Organization: xai. License: Proprietary.
78.9% percentile inside its fair comparison set1,416Raw benchmark valueCI 1,411 - 1,421
Text Arena · Multi Turn · No Style Control
AR · Chat / text · Human
It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.
Rank #44 · Source label: grok-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 86.7%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `grok-4.1`. Category: multi_turn. Source rank: #55. Votes: 12345. Organization: xai. License: Proprietary.
86.7% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,433 - 1,446