UAB
Home/Find
Find
Live · updated continuously

Describe the job.

Set filters, get ranked options with verified scores and source dates attached.
Recommendation · Open-weight shortlist

Best evidence-backed choices for Open-weight shortlist with open models only: Qwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B, and DeepSeek Chat.

Qwen3.5 27BQwen · budget
Verified & stable

The current evidence supports a shortlist, not a single winner.

Direct sources
16
Coverage
75%
Data version
Jun 20, 2026
Evidence behind the call
Data version Jun 20, 202616 direct source linksNo excluded sourcesAll public sources
What the data can support5 ready, 1 partial, 1 missing for the current task result.
Minimum source requirementenough direct data · enough direct data · enough direct data · enough direct data
Cost / latencyLatency is unavailable in the current verified source data.
Current top picksQwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B, DeepSeek Chat
Answer typeTop picks, not one winner
Coverage75% visible · 75% checked
PresetOpen-weight shortlist
SourcesAll public sources
Latest strong source dataJun 24, 2026
Where sources differ4 source rows behind this answer
1
Qwen3.5 27BArtificial Analysis · GPQA · 84.242 · #34 · 91pctl
used for answer
2
Qwen3.5 27BArtificial Analysis · Intelligence Index · 29.311 · #57 · 86pctl
used for answer
3
Qwen3.5 27BArena · Text Arena · Math · No Style Control · 1,433.473 · #47 · 85pctl
used for answer
4
Qwen3.5 27BArena · Text Arena · Math · 1,429.258 · #55 · 83pctl
used for answer
Ranked options

Direct matches first, then weaker or missing-data cases

Direct matches stay strict; strong models with indirect data still surface below. Open a row for its scores, source links, and caveats.

Primary group · Direct-match leaders
#1Qwen3.5 27BQwen · budget74.375% visible
enough direct data · recent · strong · clearbudget price band from registry metadataLatency is unavailable in the current verified source data.
Visible tradeoffs21.5% score spread · 100% recent data · exact alias
Fit score74.3
Strongest source datareasoning math science · chat text

Qwen3.5 27B is strongest on Reasoning / math / science and Chat / text for this preset.

  • Last updated: Latest visible source row is 0 days old.
  • Where sources differ: 22-point cross-domain spread; warning threshold is 30.
  • Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows
36
Hand-checked rows
0
Copied rows
0
Backfilled rows
0
Headline lane
Jun 24, 2026
Background data
No extra context
Formula recommendation-fit-v2.0.0.
Measured by
public third-party sources
Source basis
exact alias
Compared fairly because
same test setup, version groups, source-balanced before averaging
Open model
Open compare
#2Qwen3.5 122B A10BQwen · mid71.775% visible
enough direct data · recent · strong · clearmid-priced price band from registry metadataLatency is unavailable in the current verified source data.
Visible tradeoffs26% score spread · 100% recent data · exact alias
Fit score71.7
Strongest source datareasoning math science · chat text

Qwen3.5 122B A10B is strongest on Reasoning / math / science and Chat / text for this preset.

  • Last updated: Latest visible source row is 0 days old.
  • Where sources differ: 26-point cross-domain spread; warning threshold is 30.
  • Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows
38
Hand-checked rows
0
Copied rows
0
Backfilled rows
0
Headline lane
Jun 24, 2026
Background data
No extra context
Formula recommendation-fit-v2.0.0.
Measured by
public third-party sources
Source basis
exact alias
Compared fairly because
same test setup, version groups, source-balanced before averaging
Open model
Open compare
#3Gemma 4 26B A4BGoogle · mid67.075% visible
enough direct data · recent · mixed · clearmid-priced price band from registry metadataLatency is unavailable in the current verified source data.
Visible tradeoffs30.5% score spread · 100% recent data · exact alias
Fit score67.0
Strongest source datareasoning math science · chat text

Gemma 4 26B A4B is strongest on Reasoning / math / science and Chat / text for this preset.

  • Last updated: Latest visible source row is 0 days old.
  • Where sources differ: 31-point cross-domain spread; warning threshold is 30.
  • Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows
38
Hand-checked rows
0
Copied rows
0
Backfilled rows
0
Headline lane
Jun 24, 2026
Background data
No extra context
Formula recommendation-fit-v2.0.0.
Measured by
public third-party sources
Source basis
exact alias
Compared fairly because
same test setup, version groups, source-balanced before averaging
Open model
Open compare
#4DeepSeek ChatDeepSeek · budget63.275% visible
enough direct data · recent · strong · clearbudget price band from registry metadataLatency is unavailable in the current verified source data.
Visible tradeoffs3.2% score spread · 57.3% recent data · exact alias
Fit score63.2
Strongest source datacoding · reasoning math science

DeepSeek Chat is strongest on Coding and Reasoning / math / science for this preset.

  • Last updated: Latest visible source row is 0 days old.
  • Where sources differ: 3-point cross-domain spread; warning threshold is 30.
  • Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Chat / text, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows
61
Hand-checked rows
0
Copied rows
0
Backfilled rows
0
Headline lane
Jun 24, 2026
Background data
No extra context
Formula recommendation-fit-v2.0.0.
Measured by
public third-party sources
Source basis
exact alias
Compared fairly because
same test setup, version groups, source-balanced before averaging
Open model
Open compare
#5DeepSeek ReasonerDeepSeek · budget62.075% visible
Source status unavailablebudget price bandLatency unavailable
Visible tradeoffs12.5% score spread · 100% recent data · exact alias
Fit score62.0
Strongest source datareasoning math science · chat text

DeepSeek Reasoner is strongest on Reasoning / math / science and Chat / text for this preset.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Vals AI, Arena.

Verified rows
34
Hand-checked rows
0
Copied rows
0
Backfilled rows
0
Headline lane
Jun 24, 2026
Background data
No extra context
Formula recommendation-fit-v2.0.0.

No source link clears the minimum source requirement.

Measured by
public third-party sources
Source basis
exact alias
Compared fairly because
same test setup, version groups, source-balanced before averaging
Open model
Open compare
Evidence & limits

Read the argument before you commit

Why these options made the listtop reasons behind the current answer
  • Current shortlist: Qwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B, and DeepSeek Chat.
  • Qwen3.5 27B is the strongest exact-match option still visible.
  • Qwen3.5 27B currently leads the fit score at 74.3, but the evidence is still too mixed for a single headline winner.
What to pressure testwhere the current answer is still fragile
  • No single winner: The current public evidence is only strong enough to support a shortlist, not one winner.
  • Strongest alternative · Qwen3.5 122B A10B: Qwen3.5 122B A10B is strongest on Reasoning / math / science and Chat / text for this preset.
  • Evidence risk: The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.
What would flip the answerthe assumptions the result rests on
  • If you tighten benchmark spread: Qwen3.5 27B still holds if you care more about aligned evidence than upside.
  • If you tighten recency: Qwen3.5 27B remains viable because the visible evidence is still fairly fresh.
  • If cost and speed matter more: Qwen3.5 27B
Why this is not a clean winlimitations to keep in mind
  • The current evidence supports a shortlist, not a single winner.
  • Qwen3.5 122B A10B remains close enough that a different scoring recipe can still flip the public answer.
Source links6 rows behind this answer
Data limitswhere the public data is thin
  • cost: partial · Only registry price bands are ready; exact price calculations are not shown.
  • latency: missing · Latency is not available in the current verified shortlist data.
Not chosen6 well-known models left off

Llama 4 Maverick

Closest option

  • Llama 4 Maverick has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Open model

DeepSeek V3.2 Exp

Closest option

  • Missing benchmark coverage in Embeddings / retrieval.
  • DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Open model

Qwen3.5 397B A17B

Closest option

  • Missing benchmark coverage in Embeddings / retrieval.
  • Qwen3.5 397B A17B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Open model

Qwen3 Max

Closest option

  • Missing benchmark coverage in Embeddings / retrieval.
  • Qwen3 Max has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Open model

Qwen3 32B

Closest option

  • Missing benchmark coverage in Embeddings / retrieval.
  • Qwen3 32B has direct evidence on part of this preset, but not enough to clear the exact-match floor.
Open model

Claude Fable 5

Known current model

  • Current generated catalog does not have enough matching source links for this task preset.
Open model
Needs more source data12 tracked models with thin public data

Llama 4 Maverick

Meta · 100% visible · 100% direct · 0% indirect

Llama 4 Maverick has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textreasoning math science

DeepSeek V3.2 Exp

DeepSeek · 75% visible · 75% direct · 0% indirect

DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencechat text

Qwen3.5 397B A17B

Qwen · 75% visible · 75% direct · 0% indirect

Qwen3.5 397B A17B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencecoding

Qwen3 Max

Qwen · 75% visible · 75% direct · 0% indirect

Qwen3 Max has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencechat text

Qwen3 32B

Qwen · 75% visible · 75% direct · 0% indirect

Qwen3 32B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencecoding

Qwen3.5 Flash

Qwen · 75% visible · 75% direct · 0% indirect

Qwen3.5 Flash has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencechat text

qwen3.6-max-preview

Qwen · 75% visible · 75% direct · 0% indirect

qwen3.6-max-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencechat text

Qwen3 30B A3B

Qwen · 75% visible · 75% direct · 0% indirect

Qwen3 30B A3B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencecoding

qwen3.5-max-preview

Qwen · 75% visible · 75% direct · 0% indirect

qwen3.5-max-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
chat textreasoning math science

Magistral Small 1.2

Mistral · 75% visible · 75% direct · 0% indirect

Magistral Small 1.2 has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
chat textreasoning math science

Llama 4 Scout

Meta · 75% visible · 75% direct · 0% indirect

Llama 4 Scout has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
chat textreasoning math science

Ministral 3 14B

Mistral · 75% visible · 75% direct · 0% indirect

Ministral 3 14B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

  • Missing benchmark coverage in Embeddings / retrieval.
reasoning math sciencechat text

Shareable claims with evidence

The product should generate public claims worth checking, not just filter state.

Open change report
alert
8 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

Open
models
Arena moved via real benchmark movement

80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.

Open
models
Artificial Analysis moved via real benchmark movement

28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.

Open
sources
LLMBase moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.

Open
sources
Terminal-Bench moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.

Open

What changed this week

alert
8 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

models
Arena moved via real benchmark movement

80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.

Source-data window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z

models
Artificial Analysis moved via real benchmark movement

28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.

Source-data window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z

sources
LLMBase moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.

Source-data window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z

sources
Terminal-Bench moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.

Source-data window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z

product
Initial comparison-table release

Added comparison-table homepage, same-test normalization, per-cell source links, source pages, and custom-ranking preview.

Source-data window: 2026-04-16

models
Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

Source-data window: 2026-04-16

models
Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

Source-data window: 2026-04-15

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

If this date looks stale, you may be seeing an older build or cached deploy.

Quick routes

Jump straight to a page.

Resolve a recommendation into a public reportbest open model for long-context researchResearch assistantOpen page
Send a shortlist into compare modecompare gpt-5, claude opus, gemini proEveryday chatbotOpen page
Open a head-to-head debate pagegpt-5 vs claude opusEveryday chatbotOpen page
Open a source-difference reportbenchmark controversy for livebench codingCoding copilotOpen page
Open the latest public movementwhat changed this weekEveryday chatbotOpen page
Jump straight to an entity pageopen model gpt-5Open-weight shortlistOpen page