Qwen3 Max
Closest option
- Qwen3 Max has direct evidence on part of this preset, but not enough to clear the exact-match floor.
The current evidence supports a shortlist, not a single winner.
Direct matches stay strict; strong models with indirect data still surface below. Open a row for its scores, source links, and caveats.
Qwen3.5 27B is strongest on Reasoning / math / science and Long context for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Qwen3.5 122B A10B is strongest on Reasoning / math / science and Long context for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
DeepSeek Chat is strongest on Reasoning / math / science and Coding for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Long context, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Gemma 4 26B A4B is strongest on Reasoning / math / science and Long context for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Qwen3.5 35B A3B is strongest on Long context and Reasoning / math / science for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
No source link clears the minimum source requirement.
Closest option
Closest option
Closest option
Closest option
Closest option
Known current model
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3 Max has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3.5 397B A17B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
DeepSeek · 100% visible · 100% direct · 0% indirect
DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3.5 Flash has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Mistral · 100% visible · 100% direct · 0% indirect
Magistral Small 1.2 has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3 32B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Meta · 100% visible · 100% direct · 0% indirect
Llama 4 Maverick has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Meta · 100% visible · 100% direct · 0% indirect
Llama 4 Scout has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3.6 Max Preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Mistral · 100% visible · 100% direct · 0% indirect
Ministral 3 14B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Mistral · 100% visible · 100% direct · 0% indirect
Ministral 3 3B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Mistral · 100% visible · 100% direct · 0% indirect
Ministral 3 8B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
The product keeps parser and mapping ambiguity visible instead of silently guessing.
80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.
28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.
The product keeps parser and mapping ambiguity visible instead of silently guessing.
80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.
28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.
Added comparison-table homepage, same-test normalization, per-cell source links, source pages, and custom-ranking preview.
Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.
Stable model and creator IDs are now the preferred external identity keys when available.
If this date looks stale, you may be seeing an older build or cached deploy.
best open model for long-context researchResearch assistantOpen pagecompare gpt-5, claude opus, gemini proEveryday chatbotOpen pagegpt-5 vs claude opusEveryday chatbotOpen pagebenchmark controversy for livebench codingCoding copilotOpen pagewhat changed this weekEveryday chatbotOpen pageopen model gpt-5Open-weight shortlistOpen page