Source · Vals AI Version · vals-ai snapshot 2026-06-24 Scores · 52
Test details
Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Vals AI
metric
Accuracy (%)
judge
Objective
direction
higher better
group id
vals_medcode_current
domain
Professional reasoning
What it measures vs what it misses
✓ Measures
Medical billing support and coding tasks.
✗ Misses
Adjacent skills outside the benchmark task mix, latency, and cost.
Why this countsMedical billing support and coding tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesAdjacent skills outside the benchmark task mix, latency, and cost.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.
Fallback benchmark identity is visible for context but excluded from default ranking.
Identity
benchmark proxy (0.58)
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic. Backfilled from Claude Sonnet 4 via approved benchmark identity mapping map-claude-sonnet-4-6-to-4.