Gemini 3.1 Flash-Lite Preview
Closest option
- Gemini 3.1 Flash-Lite Preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.
The current evidence supports a shortlist, not a single winner.
Direct matches stay strict; strong models with indirect data still surface below. Open a row for its scores, source links, and caveats.
DeepSeek Chat is strongest on Reasoning / math / science and Chat / text for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Qwen3.5 27B is strongest on Reasoning / math / science and Coding for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Chat / text, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Qwen3.5 122B A10B is strongest on Reasoning / math / science and Chat / text for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Gemma 4 26B A4B is strongest on Reasoning / math / science and Chat / text for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
Qwen3.5 35B A3B is strongest on Reasoning / math / science and Chat / text for this preset.
The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
Data parser or model matching changes recently moved Artificial Analysis, Arena.
No source link clears the minimum source requirement.
Closest option
Closest option
Closest option
Closest option
Closest option
Known current model
Google · 100% visible · 100% direct · 0% indirect
Gemini 3.1 Flash-Lite Preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Anthropic · 100% visible · 100% direct · 0% indirect
Claude Haiku 4.5 has direct evidence on part of this preset, but not enough to clear the exact-match floor.
OpenAI · 100% visible · 100% direct · 0% indirect
GPT-5.4 nano has direct evidence on part of this preset, but not enough to clear the exact-match floor.
xAI · 100% visible · 100% direct · 0% indirect
Grok 4.1 Fast has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Google · 100% visible · 100% direct · 0% indirect
Gemini 2.5 Flash-Lite has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Unknown · 100% visible · 100% direct · 0% indirect
mercury-2 has direct evidence on part of this preset, but not enough to clear the exact-match floor.
DeepSeek · 100% visible · 100% direct · 0% indirect
DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3.5 Flash has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
Qwen3 235B A22B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Tencent · 100% visible · 100% direct · 0% indirect
hunyuan-hy3-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Unknown · 100% visible · 100% direct · 0% indirect
minimax-m2.1-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Qwen · 100% visible · 100% direct · 0% indirect
qwen3.6-max-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.
The product keeps parser and mapping ambiguity visible instead of silently guessing.
80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.
28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.
The product keeps parser and mapping ambiguity visible instead of silently guessing.
80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.
28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.
The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.
Added comparison-table homepage, same-test normalization, per-cell source links, source pages, and custom-ranking preview.
Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.
Stable model and creator IDs are now the preferred external identity keys when available.
If this date looks stale, you may be seeing an older build or cached deploy.
best open model for long-context researchResearch assistantOpen pagecompare gpt-5, claude opus, gemini proEveryday chatbotOpen pagegpt-5 vs claude opusEveryday chatbotOpen pagebenchmark controversy for livebench codingCoding copilotOpen pagewhat changed this weekEveryday chatbotOpen pageopen model gpt-5Open-weight shortlistOpen page