MMMU-Pro
AA · Vision understanding · Objective
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #74 · Source label: GPT-4.1
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 61.2%
- Percentile
- 45.9%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Artificial Analysis public leaderboard field `mmmuPro`.
45.9% percentile inside its fair comparison set61.2%Raw benchmark value
Vision Arena
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #41 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,214
- Percentile
- 63.3%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: overall. Source rank: #51. Votes: 42090. Organization: openai. License: Proprietary.
63.3% percentile inside its fair comparison set1,214Raw benchmark valueCI 1,208 - 1,221
Vision Arena · Captioning
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #10 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,208
- Percentile
- 65.4%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: captioning. Source rank: #11. Votes: 440. Organization: openai. License: Proprietary.
65.4% percentile inside its fair comparison set1,208Raw benchmark valueCI 1,178 - 1,238
Vision Arena · Creative Writing Vision
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #31 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,227
- Percentile
- 45.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: creative_writing_vision. Source rank: #38. Votes: 1280. Organization: openai. License: Proprietary.
45.5% percentile inside its fair comparison set1,227Raw benchmark valueCI 1,208 - 1,246
Vision Arena · Diagram
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #40 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,233
- Percentile
- 44.3%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: diagram. Source rank: #52. Votes: 2824. Organization: openai. License: Proprietary.
44.3% percentile inside its fair comparison set1,233Raw benchmark valueCI 1,220 - 1,246
Vision Arena · English
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #41 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,211
- Percentile
- 63.3%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: english. Source rank: #53. Votes: 20042. Organization: openai. License: Proprietary.
63.3% percentile inside its fair comparison set1,211Raw benchmark valueCI 1,202 - 1,220
Vision Arena · Entity Recognition
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #25 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,184
- Percentile
- 25%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: entity_recognition. Source rank: #27. Votes: 403. Organization: openai. License: Proprietary.
25% percentile inside its fair comparison set1,184Raw benchmark valueCI 1,154 - 1,214
Vision Arena · Homework
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #36 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,248
- Percentile
- 48.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: homework. Source rank: #47. Votes: 1535. Organization: openai. License: Proprietary.
48.5% percentile inside its fair comparison set1,248Raw benchmark valueCI 1,232 - 1,263
Vision Arena · Humor
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #37 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,181
- Percentile
- 26.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: humor. Source rank: #49. Votes: 1408. Organization: openai. License: Proprietary.
26.5% percentile inside its fair comparison set1,181Raw benchmark valueCI 1,161 - 1,200
Vision Arena · Ocr
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #39 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,226
- Percentile
- 45.7%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: ocr. Source rank: #50. Votes: 11601. Organization: openai. License: Proprietary.
45.7% percentile inside its fair comparison set1,226Raw benchmark valueCI 1,217 - 1,235
Vision Arena · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #42 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,210
- Percentile
- 62.4%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: overall. Source rank: #54. Votes: 42090. Organization: openai. License: Proprietary.
62.4% percentile inside its fair comparison set1,210Raw benchmark valueCI 1,204 - 1,217
Vision Arena · Captioning · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #18 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,204
- Percentile
- 34.6%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: captioning. Source rank: #17. Votes: 440. Organization: openai. License: Proprietary.
34.6% percentile inside its fair comparison set1,204Raw benchmark valueCI 1,175 - 1,233
Vision Arena · Creative Writing Vision · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #39 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,222
- Percentile
- 30.9%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: creative_writing_vision. Source rank: #49. Votes: 1280. Organization: openai. License: Proprietary.
30.9% percentile inside its fair comparison set1,222Raw benchmark valueCI 1,203 - 1,241
Vision Arena · Diagram · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #42 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,217
- Percentile
- 41.4%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: diagram. Source rank: #54. Votes: 2824. Organization: openai. License: Proprietary.
41.4% percentile inside its fair comparison set1,217Raw benchmark valueCI 1,204 - 1,229
Vision Arena · English · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #45 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,212
- Percentile
- 59.6%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: english. Source rank: #58. Votes: 20042. Organization: openai. License: Proprietary.
59.6% percentile inside its fair comparison set1,212Raw benchmark valueCI 1,203 - 1,221
Vision Arena · Entity Recognition · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #24 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,205
- Percentile
- 28.1%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: entity_recognition. Source rank: #26. Votes: 403. Organization: openai. License: Proprietary.
28.1% percentile inside its fair comparison set1,205Raw benchmark valueCI 1,175 - 1,235
Vision Arena · Homework · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #40 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,246
- Percentile
- 42.6%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: homework. Source rank: #49. Votes: 1535. Organization: openai. License: Proprietary.
42.6% percentile inside its fair comparison set1,246Raw benchmark valueCI 1,230 - 1,261
Vision Arena · Humor · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #37 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,195
- Percentile
- 26.5%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: humor. Source rank: #48. Votes: 1408. Organization: openai. License: Proprietary.
26.5% percentile inside its fair comparison set1,195Raw benchmark valueCI 1,175 - 1,214
Vision Arena · Ocr · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #42 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,218
- Percentile
- 41.4%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: ocr. Source rank: #53. Votes: 11601. Organization: openai. License: Proprietary.
41.4% percentile inside its fair comparison set1,218Raw benchmark valueCI 1,209 - 1,227
MMMU Pro
VALS-AI · Vision understanding · Objective
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #42 · Source label: openai/gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 72.4%
- Percentile
- 29.3%
- Last updated
- recent
- Eligibility
- historical_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmmu; provider: OpenAI.
29.3% percentile inside its fair comparison set72.4%Raw benchmark valueCI 70.3% - 74.5%
Vision Arena · Creative Writing
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #10 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,234
- Percentile
- 71.9%
- Last updated
- archived
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: creative_writing. Source rank: #11. Votes: 1432. Organization: openai. License: Proprietary.
71.9% percentile inside its fair comparison set1,234Raw benchmark valueCI 1,217 - 1,250
Vision Arena · Creative Writing · No Style Control
AR · Vision understanding · Human
It is useful when the model must read charts, UI, screenshots, or visual scenes rather than text alone.
Rank #15 · Source label: gpt-4.1-2025-04-14
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,223
- Percentile
- 56.3%
- Last updated
- archived
- Eligibility
- historical_model
Parsed from Arena leaderboard dataset row `gpt-4.1-2025-04-14`. Category: creative_writing. Source rank: #16. Votes: 1432. Organization: openai. License: Proprietary.
56.3% percentile inside its fair comparison set1,223Raw benchmark valueCI 1,207 - 1,239