Visible tradeoffsThis is a combined signal, so it bundles multiple inputs and should not be treated as one clean test.
source
Artificial Analysis
metric
Index (index)
judge
Combined
direction
higher better
group id
aa_intelligence_current
domain
Chat / text
What it measures vs what it misses
✓ Measures
Text-focused benchmark performance across several tasks.
✗ Misses
Multimodal quality unless separate tracks are selected.
Why this countsIt tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not prove deeper reasoning, tool use, or enterprise workflow reliability.
Leaderboard · this benchmark version
#1 · Claude Fable 5
AA · Jun 24, 2026
Source label: Claude Fable 5 (Adaptive Reasoning, Max Effort, Opus 4.8 Fallback)
Verified against the public LLMBase Mistral Nemo page, which states that benchmark values are sourced from Artificial Analysis where available. Checked 2026-04-19. Verification: manual_public_page_verification.
Verified against the public Artificial Analysis model overview page for Magistral Medium 1.2. Checked 2026-04-19. Verification: manual_public_page_verification.
Verified against the public Artificial Analysis model overview page for Magistral Small 1.2. Checked 2026-04-19. Verification: manual_public_page_verification.