Describe the job.

Set filters, get ranked options with verified scores and source dates attached.

Refine

Describe the job, then tune.

Results update live as you type. The button applies preset and filter changes inferred from your wording.

Query

Live update · Preset · Open-weight shortlist · All public sources

Current questionOpen-weight shortlist with open models only

Use case

Only models with downloadable or open weights, filtered for practical capability rather than release-page hype.

Sources to include

All public sources can include official company results while independent results catch up.

Access model

Primary filters

Current scoring recipecoverage, recency, and included sources

This preset weights chat text, coding, reasoning math science, embeddings retrieval with a 45% coverage floor and a 180-day recency window. Official company results can contribute when they are clearly labeled. Copied, historical, and demo data stay out unless you explicitly allow them.

Recommendation · Open-weight shortlist

Best evidence-backed choices for Open-weight shortlist with open models only: Qwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B, and DeepSeek Chat.

Qwen3.5 27BQwen · budget

Verified & stable

The current evidence supports a shortlist, not a single winner.

Direct sources: 16
Coverage: 75%
Data version: Jun 20, 2026

Compare top options Share this report

74fit

Evidence behind the call

Data version Jun 20, 202616 direct source linksNo excluded sourcesAll public sources

What the data can support5 ready, 1 partial, 1 missing for the current task result.

Minimum source requirementenough direct data · enough direct data · enough direct data · enough direct data

Cost / latencyLatency is unavailable in the current verified source data.

Current top picksQwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B, DeepSeek Chat

Answer typeTop picks, not one winner

Coverage75% visible · 75% checked

PresetOpen-weight shortlist

SourcesAll public sources

Latest strong source dataJun 24, 2026

Where sources differ4 source rows behind this answer

Qwen3.5 27BArtificial Analysis · GPQA · 84.242 · #34 · 91pctl

used for answer

Qwen3.5 27BArtificial Analysis · Intelligence Index · 29.311 · #57 · 86pctl

used for answer

Qwen3.5 27BArena · Text Arena · Math · No Style Control · 1,433.473 · #47 · 85pctl

used for answer

Qwen3.5 27BArena · Text Arena · Math · 1,429.258 · #55 · 83pctl

used for answer

Compare top options Open comparison table Coverage report

Ranked options

Direct matches first, then weaker or missing-data cases

Direct matches stay strict; strong models with indirect data still surface below. Open a row for its scores, source links, and caveats.

Primary group · Direct-match leaders

#1Qwen3.5 27BQwen · budget74.375% visible

enough direct data · recent · strong · clearbudget price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs21.5% score spread · 100% recent data · exact alias

Fit score74.3

Strongest source datareasoning math science · chat text

Qwen3.5 27B is strongest on Reasoning / math / science and Chat / text for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 22-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 36
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArtificial AnalysisGPQA · 84.242 · #34 · 91pctl Artificial AnalysisIntelligence Index · 29.311 · #57 · 86pctl ArenaText Arena · Math · No Style Control · 1,433.473 · #47 · 85pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#2Qwen3.5 122B A10BQwen · mid71.775% visible

enough direct data · recent · strong · clearmid-priced price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs26% score spread · 100% recent data · exact alias

Fit score71.7

Strongest source datareasoning math science · chat text

Qwen3.5 122B A10B is strongest on Reasoning / math / science and Chat / text for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 26-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 38
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArtificial AnalysisGPQA · 82.727 · #48 · 87pctl Artificial AnalysisHumanity's Last Exam · 14.829 · #58 · 85pctl Artificial AnalysisIntelligence Index · 28.117 · #64 · 84pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#3Gemma 4 26B A4BGoogle · mid67.075% visible

enough direct data · recent · mixed · clearmid-priced price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs30.5% score spread · 100% recent data · exact alias

Fit score67.0

Strongest source datareasoning math science · chat text

Gemma 4 26B A4B is strongest on Reasoning / math / science and Chat / text for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 31-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 38
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArenaText Arena · Math · No Style Control · 1,466.187 · #18 · 95pctl ArenaText Arena · Math · 1,466.774 · #23 · 93pctl ArenaText Arena · Instruction Following · 1,438.709 · #37 · 89pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#4DeepSeek ChatDeepSeek · budget63.275% visible

enough direct data · recent · strong · clearbudget price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs3.2% score spread · 57.3% recent data · exact alias

Fit score63.2

Strongest source datacoding · reasoning math science

DeepSeek Chat is strongest on Coding and Reasoning / math / science for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 3-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Chat / text, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 61
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArenaText Arena · Math · No Style Control · 1,436.691 · #43 · 87pctl ArenaText Arena · Instruction Following · No Style Control · 1,412.692 · #48 · 86pctl ArenaText Arena · Longer Query · No Style Control · 1,429.116 · #45 · 86pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#5DeepSeek ReasonerDeepSeek · budget62.075% visible

Source status unavailablebudget price bandLatency unavailable

Visible tradeoffs12.5% score spread · 100% recent data · exact alias

Fit score62.0

Strongest source datareasoning math science · chat text

DeepSeek Reasoner is strongest on Reasoning / math / science and Chat / text for this preset.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Vals AI, Arena.

Verified rows: 34
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

No source link clears the minimum source requirement.

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

Evidence & limits

Read the argument before you commit

Why these options made the listtop reasons behind the current answer

Current shortlist: Qwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B, and DeepSeek Chat.
Qwen3.5 27B is the strongest exact-match option still visible.
Qwen3.5 27B currently leads the fit score at 74.3, but the evidence is still too mixed for a single headline winner.

What to pressure testwhere the current answer is still fragile

No single winner: The current public evidence is only strong enough to support a shortlist, not one winner.
Strongest alternative · Qwen3.5 122B A10B: Qwen3.5 122B A10B is strongest on Reasoning / math / science and Chat / text for this preset.
Evidence risk: The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

What would flip the answerthe assumptions the result rests on

If you tighten benchmark spread: Qwen3.5 27B still holds if you care more about aligned evidence than upside.
If you tighten recency: Qwen3.5 27B remains viable because the visible evidence is still fairly fresh.
If cost and speed matter more: Qwen3.5 27B

Why this is not a clean winlimitations to keep in mind

The current evidence supports a shortlist, not a single winner.
Qwen3.5 122B A10B remains close enough that a different scoring recipe can still flip the public answer.

Source links6 rows behind this answer

Artificial Analysis · GPQA · 84.242 · #34 · 91pctl
Artificial Analysis · Intelligence Index · 29.311 · #57 · 86pctl
Arena · Text Arena · Math · No Style Control · 1,433.473 · #47 · 85pctl
Arena · Text Arena · Math · 1,429.258 · #55 · 83pctl
Artificial Analysis · GPQA · 82.727 · #48 · 87pctl
Artificial Analysis · Humanity's Last Exam · 14.829 · #58 · 85pctl

Data limitswhere the public data is thin

cost: partial · Only registry price bands are ready; exact price calculations are not shown.
latency: missing · Latency is not available in the current verified shortlist data.

Not chosen6 well-known models left off

Llama 4 Maverick

Closest option

Llama 4 Maverick has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

DeepSeek V3.2 Exp

Closest option

Missing benchmark coverage in Embeddings / retrieval.
DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Qwen3.5 397B A17B

Closest option

Missing benchmark coverage in Embeddings / retrieval.
Qwen3.5 397B A17B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Qwen3 Max

Closest option

Missing benchmark coverage in Embeddings / retrieval.
Qwen3 Max has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Qwen3 32B

Closest option

Missing benchmark coverage in Embeddings / retrieval.
Qwen3 32B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Claude Fable 5

Known current model

Current generated catalog does not have enough matching source links for this task preset.

Open model

Needs more source data12 tracked models with thin public data

Llama 4 Maverick

Meta · 100% visible · 100% direct · 0% indirect

Llama 4 Maverick has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textreasoning math science

Open model

DeepSeek V3.2 Exp

DeepSeek · 75% visible · 75% direct · 0% indirect

DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Missing benchmark coverage in Embeddings / retrieval.

reasoning math sciencechat text

Open model