Model profile · OpenAI

GPT-5.4 nano

Closed weightsbudget · registry tag 2026 nano

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 61%
Verified coverage: 52.1%
Spread: 77.3%
Last verified: Jun 20, 2026

32%bench fit

textcodevisiondocumentsearch19 aliases40 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text38 benchmarks43.3%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #261 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 8
Percentile: 34.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

34.2% percentile inside its fair comparison set

8Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #244 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 11%
Percentile: 18.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

18.5% percentile inside its fair comparison set

11%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #147 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 15.7%
Percentile: 51%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

51% percentile inside its fair comparison set

15.7%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #238 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 32.5%
Percentile: 24.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

24.8% percentile inside its fair comparison set

32.5%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #116 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.5 /1M tokens
Percentile: 58.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

58.3% percentile inside its fair comparison set

$0.5 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #94 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.2 /1M input tokens
Percentile: 66.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

66.3% percentile inside its fair comparison set

$0.2 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #131 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.3 /1M output tokens
Percentile: 52.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

52.9% percentile inside its fair comparison set

$1.3 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #62 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 138.1 tokens/s
Percentile: 71%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

71% percentile inside its fair comparison set

138.1 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #206 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 99.69s
Percentile: 2.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

2.4% percentile inside its fair comparison set

99.69sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #205 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 99.69s
Percentile: 2.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

2.9% percentile inside its fair comparison set

99.69sRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #182 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6
Percentile: 7.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

7.5% percentile inside its fair comparison set

6Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #93 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 71.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #115. Votes: 38610. Organization: openai. License: Proprietary.

71.7% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,399 - 1,407

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #131 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,333
Percentile: 59.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing. Source rank: #158. Votes: 6159. Organization: openai. License: Proprietary.

59.8% percentile inside its fair comparison set

1,333Raw benchmark valueCI 1,324 - 1,341

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #95 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,415
Percentile: 71.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #115. Votes: 18566. Organization: openai. License: Proprietary.

71.1% percentile inside its fair comparison set

1,415Raw benchmark valueCI 1,409 - 1,421

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #92 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 72%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: exclude_ties. Source rank: #112. Votes: 29611. Organization: openai. License: Proprietary.

72% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,385 - 1,396

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #92 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,422
Percentile: 72%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts. Source rank: #112. Votes: 25109. Organization: openai. License: Proprietary.

72% percentile inside its fair comparison set

1,422Raw benchmark valueCI 1,417 - 1,428

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,431
Percentile: 71.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts_english. Source rank: #114. Votes: 12774. Organization: openai. License: Proprietary.

71.3% percentile inside its fair comparison set

1,431Raw benchmark valueCI 1,424 - 1,437

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #101 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,386
Percentile: 69.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: instruction_following. Source rank: #124. Votes: 12936. Organization: openai. License: Proprietary.

69.2% percentile inside its fair comparison set

1,386Raw benchmark valueCI 1,379 - 1,392

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,405
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: longer_query. Source rank: #123. Votes: 16161. Organization: openai. License: Proprietary.

67.8% percentile inside its fair comparison set

1,405Raw benchmark valueCI 1,399 - 1,411

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #83 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,414
Percentile: 74.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: multi_turn. Source rank: #102. Votes: 7363. Organization: openai. License: Proprietary.

74.6% percentile inside its fair comparison set

1,414Raw benchmark valueCI 1,406 - 1,422

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #113 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,374
Percentile: 65.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #136. Votes: 38610. Organization: openai. License: Proprietary.

65.5% percentile inside its fair comparison set

1,374Raw benchmark valueCI 1,369 - 1,378

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #138 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,311
Percentile: 57.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing. Source rank: #166. Votes: 6159. Organization: openai. License: Proprietary.

57.6% percentile inside its fair comparison set

1,311Raw benchmark valueCI 1,303 - 1,320

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #119 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,383
Percentile: 63.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #141. Votes: 18566. Organization: openai. License: Proprietary.

63.7% percentile inside its fair comparison set

1,383Raw benchmark valueCI 1,377 - 1,388

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #110 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 66.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: exclude_ties. Source rank: #132. Votes: 29611. Organization: openai. License: Proprietary.

66.5% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,342 - 1,353

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #108 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,381
Percentile: 67.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts. Source rank: #130. Votes: 25109. Organization: openai. License: Proprietary.

67.1% percentile inside its fair comparison set

1,381Raw benchmark valueCI 1,376 - 1,387

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #108 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 67%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts_english. Source rank: #130. Votes: 12774. Organization: openai. License: Proprietary.

67% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,382 - 1,394

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #109 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,360
Percentile: 66.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: instruction_following. Source rank: #132. Votes: 12936. Organization: openai. License: Proprietary.

66.8% percentile inside its fair comparison set

1,360Raw benchmark valueCI 1,354 - 1,366

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #112 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 63.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: longer_query. Source rank: #137. Votes: 16161. Organization: openai. License: Proprietary.

63.5% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,362 - 1,374

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,383
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: multi_turn. Source rank: #128. Votes: 7363. Organization: openai. License: Proprietary.

67.5% percentile inside its fair comparison set

1,383Raw benchmark valueCI 1,375 - 1,390

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 16.5%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

1.9% percentile inside its fair comparison set

16.5%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #108 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 28.7%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

0.9% percentile inside its fair comparison set

28.7%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #86 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20.8%
Percentile: 21.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

21.3% percentile inside its fair comparison set

20.8%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 16.4%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

2.8% percentile inside its fair comparison set

16.4%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #100 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 17.6%
Percentile: 8.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

8.3% percentile inside its fair comparison set

17.6%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 11.2%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

0% percentile inside its fair comparison set

11.2%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 25.7%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

1.9% percentile inside its fair comparison set

25.7%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 14.4%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

1.9% percentile inside its fair comparison set

14.4%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: gpt-5.4-nano-medium

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 38%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

1.9% percentile inside its fair comparison set

38%Raw benchmark value

Coding22 benchmarks39.1%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #164 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.8%
Percentile: 46%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

46% percentile inside its fair comparison set

6.8%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #183 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 29.1%
Percentile: 50.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

50.5% percentile inside its fair comparison set

29.1%Raw benchmark value

Coding Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #18 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 56
Percentile: 77.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `codingIndex`.

77.3% percentile inside its fair comparison set

56Raw benchmark value

Agentic Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #20 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 28
Percentile: 58.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `agenticIndex`.

58.7% percentile inside its fair comparison set

28Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: gpt-5-medium

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,394
Percentile: 53.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-medium`. Category: overall. Source rank: #43. Votes: 3755. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

53.4% percentile inside its fair comparison set

1,394Raw benchmark valueCI 1,382 - 1,407

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: gpt-5-medium

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,394
Percentile: 53.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-medium`. Category: webdev. Source rank: #43. Votes: 3755. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

53.4% percentile inside its fair comparison set

1,394Raw benchmark valueCI 1,382 - 1,407

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: gpt-5-medium

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 53.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-medium`. Category: webdev-html. Source rank: #43. Votes: 3755. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

53.4% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,388 - 1,416

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #24 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 84%
Percentile: 74.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: OpenAI.

74.4% percentile inside its fair comparison set

84%Raw benchmark valueCI 82% - 86.1%

SWE-bench Verified

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #37 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 69.8%
Percentile: 33.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: swebench; provider: OpenAI.

33.3% percentile inside its fair comparison set

69.8%Raw benchmark valueCI 65.8% - 73.8%

Terminal-Bench 2.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #26 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 41.6%
Percentile: 7.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: terminal-bench-2-1; provider: OpenAI.

7.4% percentile inside its fair comparison set

41.6%Raw benchmark valueCI 35.7% - 47.4%

Vibe Code Bench v1.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #23 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 26.1%
Percentile: 55.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vibe-code; provider: OpenAI.

55.1% percentile inside its fair comparison set

26.1%Raw benchmark valueCI 16.2% - 36%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #76 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,460
Percentile: 76.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: coding. Source rank: #96. Votes: 10783. Organization: openai. License: Proprietary.

76.6% percentile inside its fair comparison set

1,460Raw benchmark valueCI 1,453 - 1,467

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #104 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: coding. Source rank: #125. Votes: 10783. Organization: openai. License: Proprietary.

67.8% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,396 - 1,410

IOI

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #21 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 15.3%
Percentile: 54.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

54.5% percentile inside its fair comparison set

15.3%Raw benchmark valueCI 2.6% - 27.9%

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #89 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 23.3%
Percentile: 18.5%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

18.5% percentile inside its fair comparison set

23.3%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #99 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 61.9%
Percentile: 9.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

9.3% percentile inside its fair comparison set

61.9%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #85 · Source label: gpt-5-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20%
Percentile: 22.2%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

22.2% percentile inside its fair comparison set

20%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #92 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 10%
Percentile: 16.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

16.8% percentile inside its fair comparison set

10%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #90 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 35%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

17.6% percentile inside its fair comparison set

35%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 61.5%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

2.8% percentile inside its fair comparison set

61.5%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #101 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 54.3%
Percentile: 7.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

7.4% percentile inside its fair comparison set

54.3%Raw benchmark value

Terminal-Bench 2.0

TERMINAL-BENCH · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #30 · Source label: gpt-5-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Terminal-Bench
Raw value: 11.5%
Percentile: 3.3%
Last updated: archived
Eligibility: headline eligible

Parsed from the public Terminal-Bench 2.0 verified leaderboard. Collapse policy: highest verified score per canonical model. Selected agent: Codex CLI (0.53.0). Display model: GPT-5-Nano. Integration method: API. Agent URL: https://developers.openai.com/codex/cli/. Reported stderr: 1.157 percentage points.

3.3% percentile inside its fair comparison set

11.5%Raw benchmark value

Reasoning / math / science21 benchmarks29.3%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #305 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 4.1%
Percentile: 17.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

17.8% percentile inside its fair comparison set

4.1%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #278 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 42.8%
Percentile: 25.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

25.9% percentile inside its fair comparison set

42.8%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0%
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

65.2% percentile inside its fair comparison set

0%Raw benchmark value

ProofBench

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #26 · Source label: openai/gpt-5-nano-2025-08-07

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 12%
Percentile: 28.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: proof_bench; provider: OpenAI.

28.6% percentile inside its fair comparison set

12%Raw benchmark valueCI 5.6% - 18.4%

GPQA Diamond

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #50 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.5%
Percentile: 44.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: gpqa; provider: OpenAI.

44.9% percentile inside its fair comparison set

77.5%Raw benchmark valueCI 72.8% - 82.3%

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #71 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.2%
Percentile: 21.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

21.3% percentile inside its fair comparison set

77.2%Raw benchmark valueCI 76.3% - 78%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #53 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,432
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: math. Source rank: #66. Votes: 2079. Organization: openai. License: Proprietary.

83.4% percentile inside its fair comparison set

1,432Raw benchmark valueCI 1,419 - 1,446

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #78 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 75.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: math. Source rank: #94. Votes: 2079. Organization: openai. License: Proprietary.

75.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,400 - 1,427

TutorBench

SL · Reasoning / math / science · Rubric

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 55.3%
Percentile: 80%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Manually verified from the official Scale Labs TutorBench leaderboard. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

80% percentile inside its fair comparison set

55.3%Raw benchmark value

MultiNRC

SL · Reasoning / math / science · Rubric

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #7 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 52.1%
Percentile: 60%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

60% percentile inside its fair comparison set

52.1%Raw benchmark value

EnigmaEval

SL · Reasoning / math / science · Rubric

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #7 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 64%
Percentile: 80%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

80% percentile inside its fair comparison set

64%Raw benchmark value

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 36%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

0% percentile inside its fair comparison set

36%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 17.4%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

0% percentile inside its fair comparison set

17.4%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

2.8% percentile inside its fair comparison set

62%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #91 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 9.1%
Percentile: 16.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

16.7% percentile inside its fair comparison set

9.1%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 32.4%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

2.8% percentile inside its fair comparison set

32.4%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #105 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 40.6%
Percentile: 3.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

3.7% percentile inside its fair comparison set

40.6%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 25%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

1.9% percentile inside its fair comparison set

25%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 0.6%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

0.9% percentile inside its fair comparison set

0.6%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 44%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

1.9% percentile inside its fair comparison set

44%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #108 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 0%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

0.9% percentile inside its fair comparison set

0%Raw benchmark value

Professional reasoning34 benchmarks51%

GDPval-AA

AA · Professional reasoning · Rubric

Agentic performance on economically valuable work tasks.

Rank #37 · Source label: GPT-5.4 nano (Non-Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 714
Percentile: 21.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.

21.7% percentile inside its fair comparison set

714Raw benchmark value

APEX-Agents-AA

AA · Professional reasoning · Objective

Long-horizon agentic task completion.

Rank #10 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 24.9%
Percentile: 62.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `apexAgents`.

62.5% percentile inside its fair comparison set

24.9%Raw benchmark value

Vals Index

VALS-AI · Professional reasoning · Combined

Weighted model performance across economically relevant Vals tasks.

Rank #20 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 47
Percentile: 26.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: OpenAI.

26.9% percentile inside its fair comparison set

47Raw benchmark valueCI 43 - 50

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #65 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.9%
Percentile: 28.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.

28.9% percentile inside its fair comparison set

77.9%Raw benchmark valueCI 77.1% - 78.8%

Finance Agent v2

VALS-AI · Professional reasoning · Objective

Core financial analyst tasks for agentic models.

Rank #18 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 38.2%
Percentile: 32%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: OpenAI.

32% percentile inside its fair comparison set

38.2%Raw benchmark valueCI 35.9% - 40.5%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #71 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 67.4%
Percentile: 23.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.

23.1% percentile inside its fair comparison set

67.4%Raw benchmark valueCI 65.6% - 69.2%

MedCode

VALS-AI · Professional reasoning · Objective

Medical billing support and coding tasks.

Rank #25 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 41%
Percentile: 52.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

52.9% percentile inside its fair comparison set

41%Raw benchmark valueCI 36.6% - 45.5%

MedScribe

VALS-AI · Professional reasoning · Objective

Administrative documentation support for doctors.

Rank #26 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.1%
Percentile: 50%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.

50% percentile inside its fair comparison set

77.1%Raw benchmark valueCI 73.4% - 80.8%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #76 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 72.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: expert. Source rank: #94. Votes: 3595. Organization: openai. License: Proprietary.

72.7% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,427 - 1,448

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #97 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 69.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #117. Votes: 7669. Organization: openai. License: Proprietary.

69.8% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,394 - 1,410

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #109 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,351
Percentile: 66.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #133. Votes: 7919. Organization: openai. License: Proprietary.

66.6% percentile inside its fair comparison set

1,351Raw benchmark valueCI 1,343 - 1,359

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #114 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 62.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_legal_and_government. Source rank: #137. Votes: 2965. Organization: openai. License: Proprietary.

62.1% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,384 - 1,408

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 71.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_life_and_physical_and_social_science. Source rank: #116. Votes: 6425. Organization: openai. License: Proprietary.

71.2% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,410 - 1,427

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #59 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 81.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_mathematical. Source rank: #72. Votes: 2092. Organization: openai. License: Proprietary.

81.2% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,425 - 1,453

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #100 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 66.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_medicine_and_healthcare. Source rank: #122. Votes: 2883. Organization: openai. License: Proprietary.

66.4% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,408 - 1,432

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #80 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,450
Percentile: 75.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_software_and_it_services. Source rank: #99. Votes: 15267. Organization: openai. License: Proprietary.

75.7% percentile inside its fair comparison set

1,450Raw benchmark valueCI 1,444 - 1,456

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #115 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,362
Percentile: 64.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_writing_and_literature_and_language. Source rank: #140. Votes: 9207. Organization: openai. License: Proprietary.

64.8% percentile inside its fair comparison set

1,362Raw benchmark valueCI 1,354 - 1,369

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #96 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 65.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: expert. Source rank: #118. Votes: 3595. Organization: openai. License: Proprietary.

65.5% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,385 - 1,406

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #112 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,362
Percentile: 65.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_business_and_management_and_financial_operations. Source rank: #134. Votes: 7669. Organization: openai. License: Proprietary.

65.1% percentile inside its fair comparison set

1,362Raw benchmark valueCI 1,354 - 1,369

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #125 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 61.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #151. Votes: 7919. Organization: openai. License: Proprietary.

61.6% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,316 - 1,331

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #123 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 59.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_legal_and_government. Source rank: #149. Votes: 2965. Organization: openai. License: Proprietary.

59.1% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,352 - 1,376

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #121 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,377
Percentile: 62.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_life_and_physical_and_social_science. Source rank: #143. Votes: 6425. Organization: openai. License: Proprietary.

62.8% percentile inside its fair comparison set

1,377Raw benchmark valueCI 1,369 - 1,385

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #78 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 75%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_mathematical. Source rank: #94. Votes: 2092. Organization: openai. License: Proprietary.

75% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,402 - 1,430

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #118 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 60.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_medicine_and_healthcare. Source rank: #142. Votes: 2883. Organization: openai. License: Proprietary.

60.3% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,364 - 1,388

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #107 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,401
Percentile: 67.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_software_and_it_services. Source rank: #129. Votes: 15267. Organization: openai. License: Proprietary.

67.4% percentile inside its fair comparison set

1,401Raw benchmark valueCI 1,395 - 1,407

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #125 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,337
Percentile: 61.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_writing_and_literature_and_language. Source rank: #152. Votes: 9207. Organization: openai. License: Proprietary.

61.7% percentile inside its fair comparison set

1,337Raw benchmark valueCI 1,330 - 1,344

SAGE

VALS-AI · Professional reasoning · Objective

Student Assessment with Generative Evaluation.

Rank #32 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 38.1%
Percentile: 31.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.

31.1% percentile inside its fair comparison set

38.1%Raw benchmark valueCI 32% - 44.1%

PRBench Legal

SL · Professional reasoning · Rubric

Applied legal reasoning on professional-domain tasks.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 49%
Percentile: 83.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

83.3% percentile inside its fair comparison set

49%Raw benchmark value

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #104 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 39.2%
Percentile: 4.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

4.6% percentile inside its fair comparison set

39.2%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 32.4%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

0% percentile inside its fair comparison set

32.4%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #102 · Source label: gpt-5-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 1.5%
Percentile: 6.5%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

6.5% percentile inside its fair comparison set

1.5%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #108 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 17.9%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

0.9% percentile inside its fair comparison set

17.9%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #103 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 90.2%
Percentile: 5.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

5.6% percentile inside its fair comparison set

90.2%Raw benchmark value

Poker Agent

VALS-AI · Professional reasoning · Objective

Agent profit in poker-style strategic play.

Rank #5 · Source label: openai/gpt-5-2025-08-07

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 1,103.2 score
Percentile: 93.8%
Last updated: archived
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

93.8% percentile inside its fair comparison set

1,103.2 scoreRaw benchmark valueCI 1,103.2 score - 1,103.2 score

Search / tool use3 benchmarks33.1%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #199 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25.7%
Percentile: 35.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

35.9% percentile inside its fair comparison set

25.7%Raw benchmark value

Search Arena

AR · Search / tool use · Human

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #20 · Source label: gpt-5-search

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,180
Percentile: 43.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-search`. Category: overall. Source rank: #19. Votes: 20928. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

43.3% percentile inside its fair comparison set

1,180Raw benchmark valueCI 1,173 - 1,188

Search Arena · No Style Control

AR · Search / tool use · Human

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #27 · Source label: gpt-5-search

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,132
Percentile: 20%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-search`. Category: overall. Source rank: #27. Votes: 20928. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

20% percentile inside its fair comparison set

1,132Raw benchmark valueCI 1,127 - 1,138

Long context2 benchmarks47.7%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #191 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 20%
Percentile: 39.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

39.7% percentile inside its fair comparison set

20%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #40 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 61.2%
Percentile: 55.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: OpenAI.

55.7% percentile inside its fair comparison set

61.2%Raw benchmark valueCI 59.3% - 63.1%

Vision understanding24 benchmarks40.5%

MMMU-Pro

AA · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #129 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 31.8%
Percentile: 5.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `mmmuPro`.

5.2% percentile inside its fair comparison set

31.8%Raw benchmark value

Vision Arena

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #46 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,202
Percentile: 58.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #57. Votes: 15657. Organization: openai. License: Proprietary.

58.7% percentile inside its fair comparison set

1,202Raw benchmark valueCI 1,195 - 1,210

Vision Arena · Captioning

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #14 · Source label: gpt-5-chat

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,200
Percentile: 57.7%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: captioning. Source rank: #13. Votes: 399. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

57.7% percentile inside its fair comparison set

1,200Raw benchmark valueCI 1,170 - 1,230

Vision Arena · Creative Writing Vision

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #50 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,159
Percentile: 10.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing_vision. Source rank: #63. Votes: 841. Organization: openai. License: Proprietary.

10.9% percentile inside its fair comparison set

1,159Raw benchmark valueCI 1,136 - 1,181

Vision Arena · Diagram

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #42 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,226
Percentile: 41.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: diagram. Source rank: #54. Votes: 4154. Organization: openai. License: Proprietary.

41.4% percentile inside its fair comparison set

1,226Raw benchmark valueCI 1,215 - 1,237

Vision Arena · English

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #47 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,201
Percentile: 57.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #58. Votes: 6564. Organization: openai. License: Proprietary.

57.8% percentile inside its fair comparison set

1,201Raw benchmark valueCI 1,190 - 1,211

Vision Arena · Entity Recognition

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7 · Source label: gpt-5-high

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,257
Percentile: 87.5%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: entity_recognition. Source rank: #6. Votes: 434. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

87.5% percentile inside its fair comparison set

1,257Raw benchmark valueCI 1,224 - 1,289

Vision Arena · Homework

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #44 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,230
Percentile: 36.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: homework. Source rank: #57. Votes: 2407. Organization: openai. License: Proprietary.

36.8% percentile inside its fair comparison set

1,230Raw benchmark valueCI 1,217 - 1,244

Vision Arena · Humor

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #45 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,141
Percentile: 10.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: humor. Source rank: #57. Votes: 501. Organization: openai. License: Proprietary.

10.2% percentile inside its fair comparison set

1,141Raw benchmark valueCI 1,112 - 1,169

Vision Arena · Ocr

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #41 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,217
Percentile: 42.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: ocr. Source rank: #53. Votes: 11180. Organization: openai. License: Proprietary.

42.9% percentile inside its fair comparison set

1,217Raw benchmark valueCI 1,210 - 1,225

Vision Arena · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #47 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,198
Percentile: 57.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #60. Votes: 15657. Organization: openai. License: Proprietary.

57.8% percentile inside its fair comparison set

1,198Raw benchmark valueCI 1,191 - 1,205

Vision Arena · Captioning · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #15 · Source label: gpt-5-chat

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,215
Percentile: 53.8%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-chat`. Category: captioning. Source rank: #14. Votes: 399. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

53.8% percentile inside its fair comparison set

1,215Raw benchmark valueCI 1,185 - 1,245

Vision Arena · Creative Writing Vision · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #51 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,159
Percentile: 9.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing_vision. Source rank: #64. Votes: 841. Organization: openai. License: Proprietary.

9.1% percentile inside its fair comparison set

1,159Raw benchmark valueCI 1,137 - 1,181

Vision Arena · Diagram · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #44 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,210
Percentile: 38.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: diagram. Source rank: #56. Votes: 4154. Organization: openai. License: Proprietary.

38.6% percentile inside its fair comparison set

1,210Raw benchmark valueCI 1,199 - 1,221

Vision Arena · English · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #51 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,197
Percentile: 54.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #64. Votes: 6564. Organization: openai. License: Proprietary.

54.1% percentile inside its fair comparison set

1,197Raw benchmark valueCI 1,187 - 1,207

Vision Arena · Entity Recognition · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #12 · Source label: gpt-5-high

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,261
Percentile: 71.9%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Arena leaderboard dataset row `gpt-5-high`. Category: entity_recognition. Source rank: #11. Votes: 434. Organization: openai. License: Proprietary. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

71.9% percentile inside its fair comparison set

1,261Raw benchmark valueCI 1,233 - 1,290

Vision Arena · Homework · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #46 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,223
Percentile: 33.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: homework. Source rank: #58. Votes: 2407. Organization: openai. License: Proprietary.

33.8% percentile inside its fair comparison set

1,223Raw benchmark valueCI 1,209 - 1,236

Vision Arena · Humor · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #45 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,148
Percentile: 10.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: humor. Source rank: #58. Votes: 501. Organization: openai. License: Proprietary.

10.2% percentile inside its fair comparison set

1,148Raw benchmark valueCI 1,120 - 1,177

Vision Arena · Ocr · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #44 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,208
Percentile: 38.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: ocr. Source rank: #56. Votes: 11180. Organization: openai. License: Proprietary.

38.6% percentile inside its fair comparison set

1,208Raw benchmark valueCI 1,201 - 1,216

MMMU Pro

VALS-AI · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #38 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 73.6%
Percentile: 36.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

36.2% percentile inside its fair comparison set

73.6%Raw benchmark valueCI 71.5% - 75.7%

VTB

SL · Vision understanding · Rubric

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 17%
Percentile: 63.6%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

63.6% percentile inside its fair comparison set

17%Raw benchmark value

VISTA

SL · Vision understanding · Rubric

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 79%
Percentile: 92.9%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

92.9% percentile inside its fair comparison set

79%Raw benchmark value

Vision Arena · Creative Writing

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #33 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,092
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5-nano-high`. Category: creative_writing. Source rank: #34. Votes: 206. Organization: openai. License: Proprietary.

0% percentile inside its fair comparison set

1,092Raw benchmark valueCI 1,050 - 1,134

Vision Arena · Creative Writing · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #32 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,117
Percentile: 3.1%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5-nano-high`. Category: creative_writing. Source rank: #33. Votes: 206. Organization: openai. License: Proprietary.

3.1% percentile inside its fair comparison set

1,117Raw benchmark valueCI 1,077 - 1,158

Document understanding3 benchmarks49.2%

Vals Multimodal Index

VALS-AI · Document understanding · Combined

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #15 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 48
Percentile: 26.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_multimodal_index; provider: OpenAI.

26.3% percentile inside its fair comparison set

48Raw benchmark valueCI 44 - 51

MortgageTax

VALS-AI · Document understanding · Objective

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #44 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 59.1%
Percentile: 28.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mortgage_tax; provider: OpenAI.

28.3% percentile inside its fair comparison set

59.1%Raw benchmark valueCI 57.2% - 61%

Multimodal mix

OC · Document understanding · Objective

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: OpenCompass
Raw value: 75.4%
Percentile: 92.9%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

92.9% percentile inside its fair comparison set

75.4%Raw benchmark value

Safety1 benchmark61.5%

MASK

SL · Safety · Rubric

Whether a model stays honest instead of covertly optimizing against the user.

Rank #8 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 79.3%
Percentile: 61.5%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

61.5% percentile inside its fair comparison set

79.3%Raw benchmark value

Embeddings / retrieval1 benchmark100%

Retrieval

MTEB · Embeddings / retrieval · Retrieval

It is one of the few direct signals for retrieval stacks, where embedding quality matters more than chat style.

Rank #4 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: MTEB
Raw value: 58.8 ndcg
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

100% percentile inside its fair comparison set

58.8 ndcgRaw benchmark value

Multilingual16 benchmarks59.4%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #99 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,427
Percentile: 66.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #121. Votes: 2022. Organization: openai. License: Proprietary.

66.8% percentile inside its fair comparison set

1,427Raw benchmark valueCI 1,413 - 1,442

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #77 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,428
Percentile: 64.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: french. Source rank: #94. Votes: 1260. Organization: openai. License: Proprietary.

64.8% percentile inside its fair comparison set

1,428Raw benchmark valueCI 1,409 - 1,448

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,380
Percentile: 60.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: german. Source rank: #114. Votes: 543. Organization: openai. License: Proprietary.

60.8% percentile inside its fair comparison set

1,380Raw benchmark valueCI 1,353 - 1,407

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #49 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,380
Percentile: 76.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: japanese. Source rank: #67. Votes: 305. Organization: openai. License: Proprietary.

76.4% percentile inside its fair comparison set

1,380Raw benchmark valueCI 1,344 - 1,417

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #105 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 50%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: korean. Source rank: #128. Votes: 604. Organization: openai. License: Proprietary.

50% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,298 - 1,350

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: russian. Source rank: #115. Votes: 4149. Organization: openai. License: Proprietary.

67.8% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,385 - 1,405

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #88 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 59.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: spanish. Source rank: #109. Votes: 1165. Organization: openai. License: Proprietary.

59.3% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,372 - 1,411

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #110 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 63.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #133. Votes: 2022. Organization: openai. License: Proprietary.

63.1% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,382 - 1,411

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 56.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: french. Source rank: #113. Votes: 1260. Organization: openai. License: Proprietary.

56.9% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,376 - 1,415

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #110 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,352
Percentile: 54%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: german. Source rank: #132. Votes: 543. Organization: openai. License: Proprietary.

54% percentile inside its fair comparison set

1,352Raw benchmark valueCI 1,325 - 1,380

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #67 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: japanese. Source rank: #85. Votes: 305. Organization: openai. License: Proprietary.

67.5% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,311 - 1,384

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #109 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,303
Percentile: 48.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: korean. Source rank: #131. Votes: 604. Organization: openai. License: Proprietary.

48.1% percentile inside its fair comparison set

1,303Raw benchmark valueCI 1,277 - 1,329

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #108 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 63%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: russian. Source rank: #131. Votes: 4149. Organization: openai. License: Proprietary.

63% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,358 - 1,378

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #104 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 51.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: spanish. Source rank: #125. Votes: 1165. Organization: openai. License: Proprietary.

51.9% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,349 - 1,387

Vision Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #40 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,237
Percentile: 49.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #51. Votes: 922. Organization: openai. License: Proprietary.

49.4% percentile inside its fair comparison set

1,237Raw benchmark valueCI 1,213 - 1,262

Vision Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #39 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,234
Percentile: 50.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #51. Votes: 922. Organization: openai. License: Proprietary.

50.6% percentile inside its fair comparison set

1,234Raw benchmark valueCI 1,209 - 1,259

Source links and registry checks

official

OpenAI models docs

Jun 20, 2026

source →

official

Terminal-Bench

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Arena

Jun 20, 2026

source →

official

Artificial Analysis

Jun 20, 2026

source →

official

LiveBench

Jun 20, 2026

source →

Model profile · OpenAI

GPT-5.4 nano

Closed weightsbudget · registry tag 2026 nano

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 61%
Verified coverage: 52.1%
Spread: 77.3%
Last verified: Jun 20, 2026

32%bench fit

textcodevisiondocumentsearch19 aliases40 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text38 benchmarks43.3%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #261 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 8
Percentile: 34.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

34.2% percentile inside its fair comparison set

8Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #244 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 11%
Percentile: 18.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

18.5% percentile inside its fair comparison set

11%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #147 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 15.7%
Percentile: 51%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

51% percentile inside its fair comparison set

15.7%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #238 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 32.5%
Percentile: 24.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `ifbench`.

24.8% percentile inside its fair comparison set

32.5%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #116 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.5 /1M tokens
Percentile: 58.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

58.3% percentile inside its fair comparison set

$0.5 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #94 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $0.2 /1M input tokens
Percentile: 66.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

66.3% percentile inside its fair comparison set

$0.2 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #131 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.3 /1M output tokens
Percentile: 52.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

52.9% percentile inside its fair comparison set

$1.3 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #62 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 138.1 tokens/s
Percentile: 71%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

71% percentile inside its fair comparison set

138.1 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #206 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 99.69s
Percentile: 2.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

2.4% percentile inside its fair comparison set

99.69sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #205 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 99.69s
Percentile: 2.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

2.9% percentile inside its fair comparison set

99.69sRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #182 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6
Percentile: 7.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

7.5% percentile inside its fair comparison set

6Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #93 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 71.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #115. Votes: 38610. Organization: openai. License: Proprietary.

71.7% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,399 - 1,407

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #131 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,333
Percentile: 59.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing. Source rank: #158. Votes: 6159. Organization: openai. License: Proprietary.

59.8% percentile inside its fair comparison set

1,333Raw benchmark valueCI 1,324 - 1,341

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #95 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,415
Percentile: 71.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #115. Votes: 18566. Organization: openai. License: Proprietary.

71.1% percentile inside its fair comparison set

1,415Raw benchmark valueCI 1,409 - 1,421

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #92 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 72%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: exclude_ties. Source rank: #112. Votes: 29611. Organization: openai. License: Proprietary.

72% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,385 - 1,396

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #92 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,422
Percentile: 72%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts. Source rank: #112. Votes: 25109. Organization: openai. License: Proprietary.

72% percentile inside its fair comparison set

1,422Raw benchmark valueCI 1,417 - 1,428

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,431
Percentile: 71.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts_english. Source rank: #114. Votes: 12774. Organization: openai. License: Proprietary.

71.3% percentile inside its fair comparison set

1,431Raw benchmark valueCI 1,424 - 1,437

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #101 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,386
Percentile: 69.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: instruction_following. Source rank: #124. Votes: 12936. Organization: openai. License: Proprietary.

69.2% percentile inside its fair comparison set

1,386Raw benchmark valueCI 1,379 - 1,392

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #99 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,405
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: longer_query. Source rank: #123. Votes: 16161. Organization: openai. License: Proprietary.

67.8% percentile inside its fair comparison set

1,405Raw benchmark valueCI 1,399 - 1,411

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #83 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,414
Percentile: 74.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: multi_turn. Source rank: #102. Votes: 7363. Organization: openai. License: Proprietary.

74.6% percentile inside its fair comparison set

1,414Raw benchmark valueCI 1,406 - 1,422

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #113 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,374
Percentile: 65.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #136. Votes: 38610. Organization: openai. License: Proprietary.

65.5% percentile inside its fair comparison set

1,374Raw benchmark valueCI 1,369 - 1,378

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #138 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,311
Percentile: 57.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing. Source rank: #166. Votes: 6159. Organization: openai. License: Proprietary.

57.6% percentile inside its fair comparison set

1,311Raw benchmark valueCI 1,303 - 1,320

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #119 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,383
Percentile: 63.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #141. Votes: 18566. Organization: openai. License: Proprietary.

63.7% percentile inside its fair comparison set

1,383Raw benchmark valueCI 1,377 - 1,388

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #110 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 66.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: exclude_ties. Source rank: #132. Votes: 29611. Organization: openai. License: Proprietary.

66.5% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,342 - 1,353

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #108 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,381
Percentile: 67.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts. Source rank: #130. Votes: 25109. Organization: openai. License: Proprietary.

67.1% percentile inside its fair comparison set

1,381Raw benchmark valueCI 1,376 - 1,387

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #108 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,388
Percentile: 67%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: hard_prompts_english. Source rank: #130. Votes: 12774. Organization: openai. License: Proprietary.

67% percentile inside its fair comparison set

1,388Raw benchmark valueCI 1,382 - 1,394

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #109 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,360
Percentile: 66.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: instruction_following. Source rank: #132. Votes: 12936. Organization: openai. License: Proprietary.

66.8% percentile inside its fair comparison set

1,360Raw benchmark valueCI 1,354 - 1,366

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #112 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 63.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: longer_query. Source rank: #137. Votes: 16161. Organization: openai. License: Proprietary.

63.5% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,362 - 1,374

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,383
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: multi_turn. Source rank: #128. Votes: 7363. Organization: openai. License: Proprietary.

67.5% percentile inside its fair comparison set

1,383Raw benchmark valueCI 1,375 - 1,390

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 16.5%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

1.9% percentile inside its fair comparison set

16.5%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #108 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 28.7%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

0.9% percentile inside its fair comparison set

28.7%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #86 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20.8%
Percentile: 21.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

21.3% percentile inside its fair comparison set

20.8%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 16.4%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

2.8% percentile inside its fair comparison set

16.4%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #100 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 17.6%
Percentile: 8.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

8.3% percentile inside its fair comparison set

17.6%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 11.2%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

0% percentile inside its fair comparison set

11.2%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 25.7%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

1.9% percentile inside its fair comparison set

25.7%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 14.4%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

1.9% percentile inside its fair comparison set

14.4%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #106 · Source label: gpt-5.4-nano-medium

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 38%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

1.9% percentile inside its fair comparison set

38%Raw benchmark value

Coding22 benchmarks39.1%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #164 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 6.8%
Percentile: 46%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

46% percentile inside its fair comparison set

6.8%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #183 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 29.1%
Percentile: 50.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `scicode`.

50.5% percentile inside its fair comparison set

29.1%Raw benchmark value

Coding Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #18 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 56
Percentile: 77.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `codingIndex`.

77.3% percentile inside its fair comparison set

56Raw benchmark value

Agentic Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #20 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 28
Percentile: 58.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `agenticIndex`.

58.7% percentile inside its fair comparison set

28Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: gpt-5-medium

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,394
Percentile: 53.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

53.4% percentile inside its fair comparison set

1,394Raw benchmark valueCI 1,382 - 1,407

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: gpt-5-medium

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,394
Percentile: 53.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

53.4% percentile inside its fair comparison set

1,394Raw benchmark valueCI 1,382 - 1,407

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #36 · Source label: gpt-5-medium

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 53.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

53.4% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,388 - 1,416

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #24 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 84%
Percentile: 74.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: OpenAI.

74.4% percentile inside its fair comparison set

84%Raw benchmark valueCI 82% - 86.1%

SWE-bench Verified

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #37 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 69.8%
Percentile: 33.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: swebench; provider: OpenAI.

33.3% percentile inside its fair comparison set

69.8%Raw benchmark valueCI 65.8% - 73.8%

Terminal-Bench 2.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #26 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 41.6%
Percentile: 7.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: terminal-bench-2-1; provider: OpenAI.

7.4% percentile inside its fair comparison set

41.6%Raw benchmark valueCI 35.7% - 47.4%

Vibe Code Bench v1.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #23 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 26.1%
Percentile: 55.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vibe-code; provider: OpenAI.

55.1% percentile inside its fair comparison set

26.1%Raw benchmark valueCI 16.2% - 36%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #76 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,460
Percentile: 76.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: coding. Source rank: #96. Votes: 10783. Organization: openai. License: Proprietary.

76.6% percentile inside its fair comparison set

1,460Raw benchmark valueCI 1,453 - 1,467

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #104 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,403
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: coding. Source rank: #125. Votes: 10783. Organization: openai. License: Proprietary.

67.8% percentile inside its fair comparison set

1,403Raw benchmark valueCI 1,396 - 1,410

IOI

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #21 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 15.3%
Percentile: 54.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: ioi; provider: OpenAI.

54.5% percentile inside its fair comparison set

15.3%Raw benchmark valueCI 2.6% - 27.9%

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #89 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 23.3%
Percentile: 18.5%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

18.5% percentile inside its fair comparison set

23.3%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #99 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 61.9%
Percentile: 9.3%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

9.3% percentile inside its fair comparison set

61.9%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #85 · Source label: gpt-5-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 20%
Percentile: 22.2%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

22.2% percentile inside its fair comparison set

20%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #92 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 10%
Percentile: 16.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

16.8% percentile inside its fair comparison set

10%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #90 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 35%
Percentile: 17.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

17.6% percentile inside its fair comparison set

35%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 61.5%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

2.8% percentile inside its fair comparison set

61.5%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #101 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 54.3%
Percentile: 7.4%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

7.4% percentile inside its fair comparison set

54.3%Raw benchmark value

Terminal-Bench 2.0

TERMINAL-BENCH · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #30 · Source label: gpt-5-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Terminal-Bench
Raw value: 11.5%
Percentile: 3.3%
Last updated: archived
Eligibility: headline eligible

3.3% percentile inside its fair comparison set

11.5%Raw benchmark value

Reasoning / math / science21 benchmarks29.3%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #305 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 4.1%
Percentile: 17.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `hle`.

17.8% percentile inside its fair comparison set

4.1%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #278 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 42.8%
Percentile: 25.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gpqa`.

25.9% percentile inside its fair comparison set

42.8%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: GPT-5 nano (high)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 0%
Percentile: 65.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `critpt`.

65.2% percentile inside its fair comparison set

0%Raw benchmark value

ProofBench

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #26 · Source label: openai/gpt-5-nano-2025-08-07

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 12%
Percentile: 28.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: proof_bench; provider: OpenAI.

28.6% percentile inside its fair comparison set

12%Raw benchmark valueCI 5.6% - 18.4%

GPQA Diamond

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #50 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.5%
Percentile: 44.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: gpqa; provider: OpenAI.

44.9% percentile inside its fair comparison set

77.5%Raw benchmark valueCI 72.8% - 82.3%

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #71 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.2%
Percentile: 21.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: OpenAI.

21.3% percentile inside its fair comparison set

77.2%Raw benchmark valueCI 76.3% - 78%

Text Arena · Math

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #53 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,432
Percentile: 83.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: math. Source rank: #66. Votes: 2079. Organization: openai. License: Proprietary.

83.4% percentile inside its fair comparison set

1,432Raw benchmark valueCI 1,419 - 1,446

Text Arena · Math · No Style Control

AR · Reasoning / math / science · Human

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #78 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,413
Percentile: 75.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: math. Source rank: #94. Votes: 2079. Organization: openai. License: Proprietary.

75.5% percentile inside its fair comparison set

1,413Raw benchmark valueCI 1,400 - 1,427

TutorBench

SL · Reasoning / math / science · Rubric

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 55.3%
Percentile: 80%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Manually verified from the official Scale Labs TutorBench leaderboard. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

80% percentile inside its fair comparison set

55.3%Raw benchmark value

MultiNRC

SL · Reasoning / math / science · Rubric

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #7 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 52.1%
Percentile: 60%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

60% percentile inside its fair comparison set

52.1%Raw benchmark value

EnigmaEval

SL · Reasoning / math / science · Rubric

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #7 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 64%
Percentile: 80%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

80% percentile inside its fair comparison set

64%Raw benchmark value

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 36%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

0% percentile inside its fair comparison set

36%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 17.4%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

0% percentile inside its fair comparison set

17.4%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

2.8% percentile inside its fair comparison set

62%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #91 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 9.1%
Percentile: 16.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

16.7% percentile inside its fair comparison set

9.1%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #106 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 32.4%
Percentile: 2.8%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

2.8% percentile inside its fair comparison set

32.4%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #105 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 40.6%
Percentile: 3.7%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

3.7% percentile inside its fair comparison set

40.6%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 25%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

1.9% percentile inside its fair comparison set

25%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 0.6%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

0.9% percentile inside its fair comparison set

0.6%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #107 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 44%
Percentile: 1.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

1.9% percentile inside its fair comparison set

44%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #108 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 0%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

0.9% percentile inside its fair comparison set

0%Raw benchmark value

Professional reasoning34 benchmarks51%

GDPval-AA

AA · Professional reasoning · Rubric

Agentic performance on economically valuable work tasks.

Rank #37 · Source label: GPT-5.4 nano (Non-Reasoning)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 714
Percentile: 21.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.

21.7% percentile inside its fair comparison set

714Raw benchmark value

APEX-Agents-AA

AA · Professional reasoning · Objective

Long-horizon agentic task completion.

Rank #10 · Source label: GPT-5.4 nano (xhigh)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 24.9%
Percentile: 62.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `apexAgents`.

62.5% percentile inside its fair comparison set

24.9%Raw benchmark value

Vals Index

VALS-AI · Professional reasoning · Combined

Weighted model performance across economically relevant Vals tasks.

Rank #20 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 47
Percentile: 26.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: OpenAI.

26.9% percentile inside its fair comparison set

47Raw benchmark valueCI 43 - 50

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #65 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.9%
Percentile: 28.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: OpenAI.

28.9% percentile inside its fair comparison set

77.9%Raw benchmark valueCI 77.1% - 78.8%

Finance Agent v2

VALS-AI · Professional reasoning · Objective

Core financial analyst tasks for agentic models.

Rank #18 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 38.2%
Percentile: 32%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: OpenAI.

32% percentile inside its fair comparison set

38.2%Raw benchmark valueCI 35.9% - 40.5%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #71 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 67.4%
Percentile: 23.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: OpenAI.

23.1% percentile inside its fair comparison set

67.4%Raw benchmark valueCI 65.6% - 69.2%

MedCode

VALS-AI · Professional reasoning · Objective

Medical billing support and coding tasks.

Rank #25 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 41%
Percentile: 52.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

52.9% percentile inside its fair comparison set

41%Raw benchmark valueCI 36.6% - 45.5%

MedScribe

VALS-AI · Professional reasoning · Objective

Administrative documentation support for doctors.

Rank #26 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 77.1%
Percentile: 50%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: OpenAI.

50% percentile inside its fair comparison set

77.1%Raw benchmark valueCI 73.4% - 80.8%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #76 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,437
Percentile: 72.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: expert. Source rank: #94. Votes: 3595. Organization: openai. License: Proprietary.

72.7% percentile inside its fair comparison set

1,437Raw benchmark valueCI 1,427 - 1,448

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #97 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,402
Percentile: 69.8%
Last updated: recent
Eligibility: headline eligible

69.8% percentile inside its fair comparison set

1,402Raw benchmark valueCI 1,394 - 1,410

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #109 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,351
Percentile: 66.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #133. Votes: 7919. Organization: openai. License: Proprietary.

66.6% percentile inside its fair comparison set

1,351Raw benchmark valueCI 1,343 - 1,359

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #114 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 62.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_legal_and_government. Source rank: #137. Votes: 2965. Organization: openai. License: Proprietary.

62.1% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,384 - 1,408

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,419
Percentile: 71.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_life_and_physical_and_social_science. Source rank: #116. Votes: 6425. Organization: openai. License: Proprietary.

71.2% percentile inside its fair comparison set

1,419Raw benchmark valueCI 1,410 - 1,427

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #59 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,439
Percentile: 81.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_mathematical. Source rank: #72. Votes: 2092. Organization: openai. License: Proprietary.

81.2% percentile inside its fair comparison set

1,439Raw benchmark valueCI 1,425 - 1,453

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #100 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,420
Percentile: 66.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_medicine_and_healthcare. Source rank: #122. Votes: 2883. Organization: openai. License: Proprietary.

66.4% percentile inside its fair comparison set

1,420Raw benchmark valueCI 1,408 - 1,432

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #80 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,450
Percentile: 75.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_software_and_it_services. Source rank: #99. Votes: 15267. Organization: openai. License: Proprietary.

75.7% percentile inside its fair comparison set

1,450Raw benchmark valueCI 1,444 - 1,456

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #115 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,362
Percentile: 64.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_writing_and_literature_and_language. Source rank: #140. Votes: 9207. Organization: openai. License: Proprietary.

64.8% percentile inside its fair comparison set

1,362Raw benchmark valueCI 1,354 - 1,369

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #96 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 65.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: expert. Source rank: #118. Votes: 3595. Organization: openai. License: Proprietary.

65.5% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,385 - 1,406

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #112 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,362
Percentile: 65.1%
Last updated: recent
Eligibility: headline eligible

65.1% percentile inside its fair comparison set

1,362Raw benchmark valueCI 1,354 - 1,369

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #125 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 61.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_entertainment_and_sports_and_media. Source rank: #151. Votes: 7919. Organization: openai. License: Proprietary.

61.6% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,316 - 1,331

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #123 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,364
Percentile: 59.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_legal_and_government. Source rank: #149. Votes: 2965. Organization: openai. License: Proprietary.

59.1% percentile inside its fair comparison set

1,364Raw benchmark valueCI 1,352 - 1,376

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #121 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,377
Percentile: 62.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_life_and_physical_and_social_science. Source rank: #143. Votes: 6425. Organization: openai. License: Proprietary.

62.8% percentile inside its fair comparison set

1,377Raw benchmark valueCI 1,369 - 1,385

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #78 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,416
Percentile: 75%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_mathematical. Source rank: #94. Votes: 2092. Organization: openai. License: Proprietary.

75% percentile inside its fair comparison set

1,416Raw benchmark valueCI 1,402 - 1,430

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #118 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,376
Percentile: 60.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_medicine_and_healthcare. Source rank: #142. Votes: 2883. Organization: openai. License: Proprietary.

60.3% percentile inside its fair comparison set

1,376Raw benchmark valueCI 1,364 - 1,388

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #107 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,401
Percentile: 67.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_software_and_it_services. Source rank: #129. Votes: 15267. Organization: openai. License: Proprietary.

67.4% percentile inside its fair comparison set

1,401Raw benchmark valueCI 1,395 - 1,407

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #125 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,337
Percentile: 61.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: industry_writing_and_literature_and_language. Source rank: #152. Votes: 9207. Organization: openai. License: Proprietary.

61.7% percentile inside its fair comparison set

1,337Raw benchmark valueCI 1,330 - 1,344

SAGE

VALS-AI · Professional reasoning · Objective

Student Assessment with Generative Evaluation.

Rank #32 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 38.1%
Percentile: 31.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: OpenAI.

31.1% percentile inside its fair comparison set

38.1%Raw benchmark valueCI 32% - 44.1%

PRBench Legal

SL · Professional reasoning · Rubric

Applied legal reasoning on professional-domain tasks.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 49%
Percentile: 83.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

83.3% percentile inside its fair comparison set

49%Raw benchmark value

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #104 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 39.2%
Percentile: 4.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

4.6% percentile inside its fair comparison set

39.2%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #109 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 32.4%
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

0% percentile inside its fair comparison set

32.4%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #102 · Source label: gpt-5-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 1.5%
Percentile: 6.5%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

6.5% percentile inside its fair comparison set

1.5%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #108 · Source label: gpt-5.4-nano

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 17.9%
Percentile: 0.9%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

0.9% percentile inside its fair comparison set

17.9%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #103 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 90.2%
Percentile: 5.6%
Last updated: archived
Eligibility: headline eligible

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

5.6% percentile inside its fair comparison set

90.2%Raw benchmark value

Poker Agent

VALS-AI · Professional reasoning · Objective

Agent profit in poker-style strategic play.

Rank #5 · Source label: openai/gpt-5-2025-08-07

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 1,103.2 score
Percentile: 93.8%
Last updated: archived
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

93.8% percentile inside its fair comparison set

1,103.2 scoreRaw benchmark valueCI 1,103.2 score - 1,103.2 score

Search / tool use3 benchmarks33.1%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #199 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25.7%
Percentile: 35.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `tau2`.

35.9% percentile inside its fair comparison set

25.7%Raw benchmark value

Search Arena

AR · Search / tool use · Human

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #20 · Source label: gpt-5-search

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,180
Percentile: 43.3%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

43.3% percentile inside its fair comparison set

1,180Raw benchmark valueCI 1,173 - 1,188

Search Arena · No Style Control

AR · Search / tool use · Human

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #27 · Source label: gpt-5-search

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,132
Percentile: 20%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

20% percentile inside its fair comparison set

1,132Raw benchmark valueCI 1,127 - 1,138

Long context2 benchmarks47.7%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #191 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 20%
Percentile: 39.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `lcr`.

39.7% percentile inside its fair comparison set

20%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #40 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 61.2%
Percentile: 55.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: OpenAI.

55.7% percentile inside its fair comparison set

61.2%Raw benchmark valueCI 59.3% - 63.1%

Vision understanding24 benchmarks40.5%

MMMU-Pro

AA · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #129 · Source label: GPT-5 nano (minimal)

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 31.8%
Percentile: 5.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Artificial Analysis public leaderboard field `mmmuPro`.

5.2% percentile inside its fair comparison set

31.8%Raw benchmark value

Vision Arena

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #46 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,202
Percentile: 58.7%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #57. Votes: 15657. Organization: openai. License: Proprietary.

58.7% percentile inside its fair comparison set

1,202Raw benchmark valueCI 1,195 - 1,210

Vision Arena · Captioning

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #14 · Source label: gpt-5-chat

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,200
Percentile: 57.7%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

57.7% percentile inside its fair comparison set

1,200Raw benchmark valueCI 1,170 - 1,230

Vision Arena · Creative Writing Vision

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #50 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,159
Percentile: 10.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing_vision. Source rank: #63. Votes: 841. Organization: openai. License: Proprietary.

10.9% percentile inside its fair comparison set

1,159Raw benchmark valueCI 1,136 - 1,181

Vision Arena · Diagram

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #42 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,226
Percentile: 41.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: diagram. Source rank: #54. Votes: 4154. Organization: openai. License: Proprietary.

41.4% percentile inside its fair comparison set

1,226Raw benchmark valueCI 1,215 - 1,237

Vision Arena · English

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #47 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,201
Percentile: 57.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #58. Votes: 6564. Organization: openai. License: Proprietary.

57.8% percentile inside its fair comparison set

1,201Raw benchmark valueCI 1,190 - 1,211

Vision Arena · Entity Recognition

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7 · Source label: gpt-5-high

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,257
Percentile: 87.5%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

87.5% percentile inside its fair comparison set

1,257Raw benchmark valueCI 1,224 - 1,289

Vision Arena · Homework

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #44 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,230
Percentile: 36.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: homework. Source rank: #57. Votes: 2407. Organization: openai. License: Proprietary.

36.8% percentile inside its fair comparison set

1,230Raw benchmark valueCI 1,217 - 1,244

Vision Arena · Humor

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #45 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,141
Percentile: 10.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: humor. Source rank: #57. Votes: 501. Organization: openai. License: Proprietary.

10.2% percentile inside its fair comparison set

1,141Raw benchmark valueCI 1,112 - 1,169

Vision Arena · Ocr

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #41 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,217
Percentile: 42.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: ocr. Source rank: #53. Votes: 11180. Organization: openai. License: Proprietary.

42.9% percentile inside its fair comparison set

1,217Raw benchmark valueCI 1,210 - 1,225

Vision Arena · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #47 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,198
Percentile: 57.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: overall. Source rank: #60. Votes: 15657. Organization: openai. License: Proprietary.

57.8% percentile inside its fair comparison set

1,198Raw benchmark valueCI 1,191 - 1,205

Vision Arena · Captioning · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #15 · Source label: gpt-5-chat

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,215
Percentile: 53.8%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

53.8% percentile inside its fair comparison set

1,215Raw benchmark valueCI 1,185 - 1,245

Vision Arena · Creative Writing Vision · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #51 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,159
Percentile: 9.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: creative_writing_vision. Source rank: #64. Votes: 841. Organization: openai. License: Proprietary.

9.1% percentile inside its fair comparison set

1,159Raw benchmark valueCI 1,137 - 1,181

Vision Arena · Diagram · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #44 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,210
Percentile: 38.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: diagram. Source rank: #56. Votes: 4154. Organization: openai. License: Proprietary.

38.6% percentile inside its fair comparison set

1,210Raw benchmark valueCI 1,199 - 1,221

Vision Arena · English · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #51 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,197
Percentile: 54.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: english. Source rank: #64. Votes: 6564. Organization: openai. License: Proprietary.

54.1% percentile inside its fair comparison set

1,197Raw benchmark valueCI 1,187 - 1,207

Vision Arena · Entity Recognition · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #12 · Source label: gpt-5-high

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,261
Percentile: 71.9%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

71.9% percentile inside its fair comparison set

1,261Raw benchmark valueCI 1,233 - 1,290

Vision Arena · Homework · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #46 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,223
Percentile: 33.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: homework. Source rank: #58. Votes: 2407. Organization: openai. License: Proprietary.

33.8% percentile inside its fair comparison set

1,223Raw benchmark valueCI 1,209 - 1,236

Vision Arena · Humor · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #45 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,148
Percentile: 10.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: humor. Source rank: #58. Votes: 501. Organization: openai. License: Proprietary.

10.2% percentile inside its fair comparison set

1,148Raw benchmark valueCI 1,120 - 1,177

Vision Arena · Ocr · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #44 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,208
Percentile: 38.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: ocr. Source rank: #56. Votes: 11180. Organization: openai. License: Proprietary.

38.6% percentile inside its fair comparison set

1,208Raw benchmark valueCI 1,201 - 1,216

MMMU Pro

VALS-AI · Vision understanding · Objective

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #38 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 73.6%
Percentile: 36.2%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.

36.2% percentile inside its fair comparison set

73.6%Raw benchmark valueCI 71.5% - 75.7%

VTB

SL · Vision understanding · Rubric

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #7 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 17%
Percentile: 63.6%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

63.6% percentile inside its fair comparison set

17%Raw benchmark value

VISTA

SL · Vision understanding · Rubric

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 79%
Percentile: 92.9%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

92.9% percentile inside its fair comparison set

79%Raw benchmark value

Vision Arena · Creative Writing

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #33 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,092
Percentile: 0%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5-nano-high`. Category: creative_writing. Source rank: #34. Votes: 206. Organization: openai. License: Proprietary.

0% percentile inside its fair comparison set

1,092Raw benchmark valueCI 1,050 - 1,134

Vision Arena · Creative Writing · No Style Control

AR · Vision understanding · Human

It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.

Rank #32 · Source label: gpt-5-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,117
Percentile: 3.1%
Last updated: archived
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5-nano-high`. Category: creative_writing. Source rank: #33. Votes: 206. Organization: openai. License: Proprietary.

3.1% percentile inside its fair comparison set

1,117Raw benchmark valueCI 1,077 - 1,158

Document understanding3 benchmarks49.2%

Vals Multimodal Index

VALS-AI · Document understanding · Combined

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #15 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 48
Percentile: 26.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_multimodal_index; provider: OpenAI.

26.3% percentile inside its fair comparison set

48Raw benchmark valueCI 44 - 51

MortgageTax

VALS-AI · Document understanding · Objective

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #44 · Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 59.1%
Percentile: 28.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mortgage_tax; provider: OpenAI.

28.3% percentile inside its fair comparison set

59.1%Raw benchmark valueCI 57.2% - 61%

Multimodal mix

OC · Document understanding · Objective

It matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.

Rank #5 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: OpenCompass
Raw value: 75.4%
Percentile: 92.9%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

92.9% percentile inside its fair comparison set

75.4%Raw benchmark value

Safety1 benchmark61.5%

MASK

SL · Safety · Rubric

Whether a model stays honest instead of covertly optimizing against the user.

Rank #8 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Scale Labs
Raw value: 79.3%
Percentile: 61.5%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

61.5% percentile inside its fair comparison set

79.3%Raw benchmark value

Embeddings / retrieval1 benchmark100%

Retrieval

MTEB · Embeddings / retrieval · Retrieval

It is one of the few direct signals for retrieval stacks, where embedding quality matters more than chat style.

Rank #4 · Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: MTEB
Raw value: 58.8 ndcg
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

100% percentile inside its fair comparison set

58.8 ndcgRaw benchmark value

Multilingual16 benchmarks59.4%

Text Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #99 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,427
Percentile: 66.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #121. Votes: 2022. Organization: openai. License: Proprietary.

66.8% percentile inside its fair comparison set

1,427Raw benchmark valueCI 1,413 - 1,442

Text Arena · French

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #77 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,428
Percentile: 64.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: french. Source rank: #94. Votes: 1260. Organization: openai. License: Proprietary.

64.8% percentile inside its fair comparison set

1,428Raw benchmark valueCI 1,409 - 1,448

Text Arena · German

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,380
Percentile: 60.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: german. Source rank: #114. Votes: 543. Organization: openai. License: Proprietary.

60.8% percentile inside its fair comparison set

1,380Raw benchmark valueCI 1,353 - 1,407

Text Arena · Japanese

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #49 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,380
Percentile: 76.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: japanese. Source rank: #67. Votes: 305. Organization: openai. License: Proprietary.

76.4% percentile inside its fair comparison set

1,380Raw benchmark valueCI 1,344 - 1,417

Text Arena · Korean

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #105 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,324
Percentile: 50%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: korean. Source rank: #128. Votes: 604. Organization: openai. License: Proprietary.

50% percentile inside its fair comparison set

1,324Raw benchmark valueCI 1,298 - 1,350

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 67.8%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: russian. Source rank: #115. Votes: 4149. Organization: openai. License: Proprietary.

67.8% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,385 - 1,405

Text Arena · Spanish

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #88 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,391
Percentile: 59.3%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: spanish. Source rank: #109. Votes: 1165. Organization: openai. License: Proprietary.

59.3% percentile inside its fair comparison set

1,391Raw benchmark valueCI 1,372 - 1,411

Text Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena chinese leaderboard.

Rank #110 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,396
Percentile: 63.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #133. Votes: 2022. Organization: openai. License: Proprietary.

63.1% percentile inside its fair comparison set

1,396Raw benchmark valueCI 1,382 - 1,411

Text Arena · French · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena french leaderboard.

Rank #94 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,395
Percentile: 56.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: french. Source rank: #113. Votes: 1260. Organization: openai. License: Proprietary.

56.9% percentile inside its fair comparison set

1,395Raw benchmark valueCI 1,376 - 1,415

Text Arena · German · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena german leaderboard.

Rank #110 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,352
Percentile: 54%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: german. Source rank: #132. Votes: 543. Organization: openai. License: Proprietary.

54% percentile inside its fair comparison set

1,352Raw benchmark valueCI 1,325 - 1,380

Text Arena · Japanese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena japanese leaderboard.

Rank #67 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,347
Percentile: 67.5%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: japanese. Source rank: #85. Votes: 305. Organization: openai. License: Proprietary.

67.5% percentile inside its fair comparison set

1,347Raw benchmark valueCI 1,311 - 1,384

Text Arena · Korean · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena korean leaderboard.

Rank #109 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,303
Percentile: 48.1%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: korean. Source rank: #131. Votes: 604. Organization: openai. License: Proprietary.

48.1% percentile inside its fair comparison set

1,303Raw benchmark valueCI 1,277 - 1,329

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #108 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 63%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: russian. Source rank: #131. Votes: 4149. Organization: openai. License: Proprietary.

63% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,358 - 1,378

Text Arena · Spanish · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena spanish leaderboard.

Rank #104 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,368
Percentile: 51.9%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: spanish. Source rank: #125. Votes: 1165. Organization: openai. License: Proprietary.

51.9% percentile inside its fair comparison set

1,368Raw benchmark valueCI 1,349 - 1,387

Vision Arena · Chinese

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #40 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,237
Percentile: 49.4%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #51. Votes: 922. Organization: openai. License: Proprietary.

49.4% percentile inside its fair comparison set

1,237Raw benchmark valueCI 1,213 - 1,262

Vision Arena · Chinese · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Vision Arena chinese leaderboard.

Rank #39 · Source label: gpt-5.4-nano-high

verified runtimeexact alias

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,234
Percentile: 50.6%
Last updated: recent
Eligibility: headline eligible

Parsed from Arena leaderboard dataset row `gpt-5.4-nano-high`. Category: chinese. Source rank: #51. Votes: 922. Organization: openai. License: Proprietary.

50.6% percentile inside its fair comparison set

1,234Raw benchmark valueCI 1,209 - 1,259