Describe the job.

Set filters, get ranked options with verified scores and source dates attached.

Refine

Describe the job, then tune.

Results update live as you type. The button applies preset and filter changes inferred from your wording.

Query

Live update · Preset · Cheap but strong · All public sources

Current questionCheap but strong with all public sources

Use case

Looks for the best capability you can get without drifting into premium or frontier pricing.

Sources to include

All public sources can include official company results while independent results catch up.

Access model

Primary filters

Current scoring recipecoverage, recency, and included sources

This preset weights chat text, coding, reasoning math science with a 50% coverage floor and a 120-day recency window. Official company results can contribute when they are clearly labeled. Copied, historical, and demo data stay out unless you explicitly allow them.

Recommendation · Cheap but strong

Best evidence-backed choices for Cheap but strong with all public sources: DeepSeek Chat, Qwen3.5 27B, Qwen3.5 122B A10B, and Gemma 4 26B A4B.

DeepSeek ChatDeepSeek · budget

Verified & stable

The current evidence supports a shortlist, not a single winner.

Direct sources: 16
Coverage: 100%
Data version: Jun 20, 2026

Compare top options Share this report

77fit

Evidence behind the call

Data version Jun 20, 202616 direct source linksNo excluded sourcesAll public sources

What the data can support5 ready, 1 partial, 1 missing for the current task result.

Minimum source requirementenough direct data · enough direct data · enough direct data · enough direct data

Cost / latencyLatency is unavailable in the current verified source data.

Current top picksDeepSeek Chat, Qwen3.5 27B, Qwen3.5 122B A10B, Gemma 4 26B A4B

Answer typeTop picks, not one winner

Coverage100% visible · 100% checked

PresetCheap but strong

SourcesAll public sources

Latest strong source dataJun 24, 2026

Where sources differ4 source rows behind this answer

DeepSeek ChatArena · Text Arena · Math · No Style Control · 1,436.691 · #43 · 87pctl

used for answer

DeepSeek ChatArena · Text Arena · Instruction Following · No Style Control · 1,412.692 · #48 · 86pctl

used for answer

DeepSeek ChatArena · Text Arena · Longer Query · No Style Control · 1,429.116 · #45 · 86pctl

used for answer

DeepSeek ChatArena · Text Arena · Hard Prompts · No Style Control · 1,433.584 · #53 · 84pctl

used for answer

Compare top options Open comparison table Coverage report

Ranked options

Direct matches first, then weaker or missing-data cases

Direct matches stay strict; strong models with indirect data still surface below. Open a row for its scores, source links, and caveats.

Primary group · Direct-match leaders

#1DeepSeek ChatDeepSeek · budget76.9100% visible

enough direct data · recent · strong · clearbudget price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs17.7% score spread · 100% recent data · exact alias

Fit score76.9

Strongest source datareasoning math science · chat text

DeepSeek Chat is strongest on Reasoning / math / science and Chat / text for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 18-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 38
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArenaText Arena · Math · No Style Control · 1,436.691 · #43 · 87pctl ArenaText Arena · Instruction Following · No Style Control · 1,412.692 · #48 · 86pctl ArenaText Arena · Longer Query · No Style Control · 1,429.116 · #45 · 86pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#2Qwen3.5 27BQwen · budget75.5100% visible

enough direct data · recent · strong · clearbudget price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs24.1% score spread · 100% recent data · exact alias

Fit score75.5

Strongest source datareasoning math science · coding

Qwen3.5 27B is strongest on Reasoning / math / science and Coding for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 24-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Chat / text, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 42
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArtificial AnalysisGPQA · 84.242 · #34 · 91pctl Artificial AnalysisIntelligence Index · 29.311 · #57 · 86pctl ArenaText Arena · Math · No Style Control · 1,433.473 · #47 · 85pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#3Qwen3.5 122B A10BQwen · mid73.8100% visible

enough direct data · recent · strong · clearmid-priced price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs26% score spread · 100% recent data · exact alias

Fit score73.8

Strongest source datareasoning math science · chat text

Qwen3.5 122B A10B is strongest on Reasoning / math / science and Chat / text for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 26-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 44
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArtificial AnalysisGPQA · 82.727 · #48 · 87pctl Artificial AnalysisHumanity's Last Exam · 14.829 · #58 · 85pctl Artificial AnalysisIntelligence Index · 28.117 · #64 · 84pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#4Gemma 4 26B A4BGoogle · mid70.8100% visible

enough direct data · recent · mixed · clearmid-priced price band from registry metadataLatency is unavailable in the current verified source data.

Visible tradeoffs30.5% score spread · 100% recent data · exact alias

Fit score70.8

Strongest source datareasoning math science · chat text

Gemma 4 26B A4B is strongest on Reasoning / math / science and Chat / text for this preset.

Last updated: Latest visible source row is 0 days old.
Where sources differ: 31-point cross-domain spread; warning threshold is 30.
Data checks: No open review item was matched to this recommendation.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 44
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

Source linksArenaText Arena · Math · No Style Control · 1,466.187 · #18 · 95pctl ArenaText Arena · Math · 1,466.774 · #23 · 93pctl ArenaText Arena · Instruction Following · 1,438.709 · #37 · 89pctl

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

#5Qwen3.5 35B A3BQwen · budget64.6100% visible

Source status unavailablebudget price bandLatency unavailable

Visible tradeoffs33.2% score spread · 100% recent data · exact alias

Fit score64.6

Strongest source datareasoning math science · chat text

Qwen3.5 35B A3B is strongest on Reasoning / math / science and Chat / text for this preset.

The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

Data parser or model matching changes recently moved Artificial Analysis, Arena.

Verified rows: 42
Hand-checked rows: 0
Copied rows: 0
Backfilled rows: 0

Headline lane: Jun 24, 2026
Background data: No extra context

Formula recommendation-fit-v2.0.0.

No source link clears the minimum source requirement.

Measured by: public third-party sources
Source basis: exact alias
Compared fairly because: same test setup, version groups, source-balanced before averaging

Open model

Open compare

Evidence & limits

Read the argument before you commit

Why these options made the listtop reasons behind the current answer

Current shortlist: DeepSeek Chat, Qwen3.5 27B, Qwen3.5 122B A10B, and Gemma 4 26B A4B.
DeepSeek Chat is the strongest exact-match option still visible.
DeepSeek Chat currently leads the fit score at 76.9, but the evidence is still too mixed for a single headline winner.

What to pressure testwhere the current answer is still fragile

No single winner: The current public evidence is only strong enough to support a shortlist, not one winner.
Strongest alternative · Qwen3.5 27B: Qwen3.5 27B is strongest on Reasoning / math / science and Coding for this preset.
Evidence risk: The visible evidence mix still leans on weaker or split signals, especially around Coding, source verification state, and any backfilled or relay evidence still in play.

What would flip the answerthe assumptions the result rests on

If you tighten benchmark spread: DeepSeek Chat still holds if you care more about aligned evidence than upside.
If you tighten recency: DeepSeek Chat remains viable because the visible evidence is still fairly fresh.
If you require open-weight: DeepSeek Chat
If cost and speed matter more: DeepSeek Chat

Why this is not a clean winlimitations to keep in mind

The current evidence supports a shortlist, not a single winner.
Qwen3.5 27B remains close enough that a different scoring recipe can still flip the public answer.

Source links6 rows behind this answer

Arena · Text Arena · Math · No Style Control · 1,436.691 · #43 · 87pctl
Arena · Text Arena · Instruction Following · No Style Control · 1,412.692 · #48 · 86pctl
Arena · Text Arena · Longer Query · No Style Control · 1,429.116 · #45 · 86pctl
Arena · Text Arena · Hard Prompts · No Style Control · 1,433.584 · #53 · 84pctl
Artificial Analysis · GPQA · 84.242 · #34 · 91pctl
Artificial Analysis · Intelligence Index · 29.311 · #57 · 86pctl

Data limitswhere the public data is thin

cost: partial · Only registry price bands are ready; exact price calculations are not shown.
latency: missing · Latency is not available in the current verified shortlist data.

Not chosen6 well-known models left off

Gemini 3.1 Flash-Lite Preview

Closest option

Gemini 3.1 Flash-Lite Preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Claude Haiku 4.5

Closest option

Claude Haiku 4.5 has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

GPT-5.4 nano

Closest option

GPT-5.4 nano has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Grok 4.1 Fast

Closest option

Grok 4.1 Fast has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Gemini 2.5 Flash-Lite

Closest option

Gemini 2.5 Flash-Lite has direct evidence on part of this preset, but not enough to clear the exact-match floor.

Open model

Claude Fable 5

Known current model

Current generated catalog does not have enough matching source links for this task preset.

Open model

Needs more source data12 tracked models with thin public data

Gemini 3.1 Flash-Lite Preview

Google · 100% visible · 100% direct · 0% indirect

Gemini 3.1 Flash-Lite Preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textcoding

Open model

Claude Haiku 4.5

Anthropic · 100% visible · 100% direct · 0% indirect

Claude Haiku 4.5 has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textcoding

Open model

GPT-5.4 nano

OpenAI · 100% visible · 100% direct · 0% indirect

GPT-5.4 nano has direct evidence on part of this preset, but not enough to clear the exact-match floor.

reasoning math sciencecoding

Open model

Grok 4.1 Fast

xAI · 100% visible · 100% direct · 0% indirect

Grok 4.1 Fast has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textreasoning math science

Open model

Gemini 2.5 Flash-Lite

Google · 100% visible · 100% direct · 0% indirect

Gemini 2.5 Flash-Lite has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textreasoning math science

Open model

mercury-2

Unknown · 100% visible · 100% direct · 0% indirect

mercury-2 has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textreasoning math science

Open model

DeepSeek V3.2 Exp

DeepSeek · 100% visible · 100% direct · 0% indirect

DeepSeek V3.2 Exp has direct evidence on part of this preset, but not enough to clear the exact-match floor.

reasoning math sciencechat text

Open model

Qwen3.5 Flash

Qwen · 100% visible · 100% direct · 0% indirect

Qwen3.5 Flash has direct evidence on part of this preset, but not enough to clear the exact-match floor.

reasoning math sciencechat text

Open model

Qwen3 235B A22B

Qwen · 100% visible · 100% direct · 0% indirect

Qwen3 235B A22B has direct evidence on part of this preset, but not enough to clear the exact-match floor.

reasoning math sciencecoding

Open model

hunyuan-hy3-preview

Tencent · 100% visible · 100% direct · 0% indirect

hunyuan-hy3-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

reasoning math sciencechat text

Open model

minimax-m2.1-preview

Unknown · 100% visible · 100% direct · 0% indirect

minimax-m2.1-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

chat textcoding

Open model

qwen3.6-max-preview

Qwen · 100% visible · 100% direct · 0% indirect

qwen3.6-max-preview has direct evidence on part of this preset, but not enough to clear the exact-match floor.

reasoning math sciencechat text

Open model

alert

8 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

Open

models

Arena moved via real benchmark movement

80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.

Open

models

Artificial Analysis moved via real benchmark movement

28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.

Open

sources

LLMBase moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.

Open

sources

Terminal-Bench moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.

Open

What changed this week

alert

8 review items still need manual judgment

The product keeps parser and mapping ambiguity visible instead of silently guessing.

Open

models

Arena moved via real benchmark movement

80 benchmark rows were added, 4 removed, and 16276 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z.

Source-data window: 2026-06-20T23:37:10Z -> 2026-06-24T03:37:55Z

Open report Model Benchmark Source

models

Artificial Analysis moved via real benchmark movement

28 benchmark rows were added, 0 removed, and 5949 existing rows changed value or evaluation date. Window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z.

Source-data window: 2026-06-20T23:37:17Z -> 2026-06-24T03:38:09Z

Open report Model Benchmark Source

sources

LLMBase moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z.

Source-data window: 2026-06-20T23:37:24Z -> 2026-06-24T03:38:25Z

Open

sources

Terminal-Bench moved via source updated leaderboard

The saved raw source snapshot changed relative to the previous run. Window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z.

Source-data window: 2026-06-20T23:37:34Z -> 2026-06-24T03:38:36Z

Open

product

Initial comparison-table release

Added comparison-table homepage, same-test normalization, per-cell source links, source pages, and custom-ranking preview.

Source-data window: 2026-04-16

Open changelog

models

Methodology contract published

Documented comparability rules, raw-vs-normalized behavior, and why unlike metrics are never averaged by default.

Source-data window: 2026-04-16

models

Artificial Analysis ID rule adopted

Stable model and creator IDs are now the preferred external identity keys when available.

Source-data window: 2026-04-15

Watchlists

Followed items reopen from their canonical URL first. Bundle export still works, but the durable state is the href plus deterministic latest-delta links, not a rebuilt local compare preset.

Open workspaces

Loading watchlist state...

No watchlists yet. Follow a recommendation card or compare set.

Saved compare views

Loading saved compare views...

Save a compare workspace to keep top options around.

Workspace bundle

Portable bundles stay link-native. Use them to preview a shared workspace, reopen the same compare URLs on another device, or import the snapshot without reconstructing intent from loose local fields.

Current workspace0 saved compare views · 0 watches · 0 pinned compare models

Preview or import a shared bundle

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

If this date looks stale, you may be seeing an older build or cached deploy.

Quick routes

Jump straight to a page.

Resolve a recommendation into a public reportbest open model for long-context researchResearch assistantOpen page

Send a shortlist into compare modecompare gpt-5, claude opus, gemini proEveryday chatbotOpen page

Open a head-to-head debate pagegpt-5 vs claude opusEveryday chatbotOpen page

Open a source-difference reportbenchmark controversy for livebench codingCoding copilotOpen page

Open the latest public movementwhat changed this weekEveryday chatbotOpen page

Jump straight to an entity pageopen model gpt-5Open-weight shortlistOpen page

Describe the job.

Best evidence-backed choices for Cheap but strong with all public sources: DeepSeek Chat, Qwen3.5 27B, Qwen3.5 122B A10B, and Gemma 4 26B A4B.

Direct matches first, then weaker or missing-data cases

Read the argument before you commit

Gemini 3.1 Flash-Lite Preview

Claude Haiku 4.5

GPT-5.4 nano

Grok 4.1 Fast

Gemini 2.5 Flash-Lite

Claude Fable 5

Gemini 3.1 Flash-Lite Preview

Claude Haiku 4.5

GPT-5.4 nano

Grok 4.1 Fast

Gemini 2.5 Flash-Lite

mercury-2

DeepSeek V3.2 Exp

Qwen3.5 Flash

Qwen3 235B A22B

hunyuan-hy3-preview

minimax-m2.1-preview

qwen3.6-max-preview

Shareable claims with evidence

What changed this week

Current snapshot.

Jump straight to a page.