Benchmarks · /benchmarks/vals-ai-medcode

MedCode

Name: MedCode
Creator: Vals AI

MedCode result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 52

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_medcode_current

domain

Professional reasoning

What it measures vs what it misses

✓ Measures

Medical billing support and coding tasks.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Why this countsMedical billing support and coding tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesAdjacent skills outside the benchmark task mix, latency, and cost.

Leaderboard · this benchmark version

#1 · Gemini 3.1 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 100%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

59.1%55.1% - 63%

#2 · Claude Fable 5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 98%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

56.1%51.8% - 60.4%

#3 · Gemini 3 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 96.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

55.9%51.8% - 60.1%

#4 · Gemini 3.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 94.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

55.8%51.7% - 60%

#5 · Claude Opus 4.7

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 92.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

54.9%50.5% - 59.2%

#6 · Claude Opus 4.8

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-8

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 90.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

53.2%49% - 57.5%

#7 · GPT-5.1

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 88.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

52.7%48.5% - 56.9%

#8 · Gemini 3 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 86.3%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

52.2%48.1% - 56.3%

#9 · muse-spark

VALS-AI · Jun 17, 2026

Source label: meta/muse_spark

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 84.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Meta.

51.3%46.9% - 55.7%

#10 · Gemini 2.5 Pro

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-pro

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 82.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

50.6%46.4% - 54.7%

#11 · GPT-5.2

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 80.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

49.7%45.3% - 54.2%

#12 · GPT-5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 78.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

49.6%45.5% - 53.7%

#13 · Claude Opus 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-5-20251101-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 76.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

49.2%45.2% - 53.1%

#14 · Claude Opus 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 74.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

49.1%45% - 53.2%

#15 · GPT-5.5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 72.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

49.1%44.8% - 53.4%

#16 · Gemini 3.1 Flash-Lite Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-flash-lite-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 70.6%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

47.6%43.5% - 51.7%

#17 · o3

VALS-AI · Jun 17, 2026

Source label: openai/o3-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 68.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

47.3%43.1% - 51.5%

#18 · Claude Opus 4.1

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 66.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

47.2%43.2% - 51.3%

#19 · Claude Opus 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 66.7%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.

47.2%43.2% - 51.3%

#20 · minimax-m3

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 62.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: MiniMax.

46.3%42.2% - 50.4%

#21 · Claude Sonnet 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 60.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

44.1%40.2% - 48.1%

#22 · GPT-5.4 mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5-mini-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 58.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

43%39% - 47.1%

#23 · glm-5.1

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 56.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

41.6%37.4% - 45.8%

#24 · GPT-5.4

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 54.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

41.3%37.1% - 45.5%

#25 · GPT-5.4 nano

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 52.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

41%36.6% - 45.5%

#26 · GLM-5.2 (max)

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 51%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

40.8%36.5% - 45%

#27 · Gemini 2.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 49%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

40.5%36.8% - 44.3%

#28 · deepseek-v4-pro

VALS-AI · Jun 17, 2026

Source label: deepseek/deepseek-v4-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 47.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: DeepSeek.

40.5%36.3% - 44.6%

#29 · kimi-k2.6

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 45.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Moonshot AI.

40.1%36.1% - 44.1%

#30 · kimi-k2.5-thinking

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 43.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Moonshot AI.

39.3%35.2% - 43.5%

#31 · Qwen3.7 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 41.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

38.8%34.4% - 43.1%

#32 · Nemotron 3 Ultra 550B A55B (Reasoning)

VALS-AI · Jun 17, 2026

Source label: nvidia/nemotron-3-ultra-550b-a55b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 39.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Nvidia.

38.6%34.7% - 42.5%

#33 · Grok 4

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 37.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

38.1%33.8% - 42.4%

#34 · Grok 4.3

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 35.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

38.1%34% - 42.1%

#35 · Grok 4 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 33.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

37.4%33.6% - 41.2%

#36 · qwen3.6-plus

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 31.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

36.9%32.9% - 40.8%

#37 · Claude Sonnet 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 29.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

35%31.2% - 38.8%

#38 · Claude Sonnet 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 29.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic. Backfilled from Claude Sonnet 4 via approved benchmark identity mapping map-claude-sonnet-4-6-to-4.

35%31.2% - 38.8%

#39 · minimax-m2.7

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 25.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: MiniMax.

34.4%30.5% - 38.3%

#40 · Gemini 2.5 Flash-Lite

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-lite-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 23.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

34.2%30.8% - 37.6%

#41 · MiniMax-M2.1

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 21.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: MiniMax.

34.1%30.3% - 37.9%

#42 · o4 mini

VALS-AI · Jun 17, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 19.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

33.8%29.8% - 37.8%

#43 · mistral-medium-3.5

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-medium-3.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 17.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Mistral AI.

33.8%31.5% - 36%

#44 · Qwen3.5 Flash

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.5-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 15.7%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

33%29.5% - 36.5%

#45 · glm-4.7

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 13.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

32.8%28.9% - 36.7%

#46 · Claude Haiku 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 11.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

32.7%28.8% - 36.6%

#47 · mimo-v2.5-pro

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 9.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Xiaomi.

32.5%28.7% - 36.2%

#48 · Grok 4.20

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 7.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

32.2%28% - 36.3%

#49 · mimo-v2.5

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 5.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Xiaomi.

31.9%27.9% - 35.9%

#50 · Qwen3 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3-max-2026-01-23

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 3.9%
Last updated: recent
Eligibility: preview_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

31.4%27.7% - 35.1%

#51 · Grok 4.1 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

28.3%24.6% - 32.1%

#52 · llama-4-scout-17b-16e-instruct

VALS-AI · Jun 17, 2026

Source label: together/meta-llama/Llama-4-Scout-17B-16E-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 0%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Together AI.

23.3%19.9% - 26.7%

Benchmarks · /benchmarks/vals-ai-medcode

MedCode

MedCode result as reported by Vals AI.

Source · Vals AI
Version · vals-ai snapshot 2026-06-24
Scores · 52

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Vals AI

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

vals_medcode_current

domain

Professional reasoning

What it measures vs what it misses

✓ Measures

Medical billing support and coding tasks.

✗ Misses

Adjacent skills outside the benchmark task mix, latency, and cost.

Leaderboard · this benchmark version

#1 · Gemini 3.1 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 100%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

59.1%55.1% - 63%

#2 · Claude Fable 5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-fable-5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 98%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

56.1%51.8% - 60.4%

#3 · Gemini 3 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-flash-preview

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 96.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

55.9%51.8% - 60.1%

#4 · Gemini 3.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.5-flash

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 94.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

55.8%51.7% - 60%

#5 · Claude Opus 4.7

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-7

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 92.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

54.9%50.5% - 59.2%

#6 · Claude Opus 4.8

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-8

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 90.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

53.2%49% - 57.5%

#7 · GPT-5.1

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.1-2025-11-13

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 88.2%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

52.7%48.5% - 56.9%

#8 · Gemini 3 Pro Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3-pro-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 86.3%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

52.2%48.1% - 56.3%

#9 · muse-spark

VALS-AI · Jun 17, 2026

Source label: meta/muse_spark

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 84.3%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Meta.

51.3%46.9% - 55.7%

#10 · Gemini 2.5 Pro

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-pro

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 82.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

50.6%46.4% - 54.7%

#11 · GPT-5.2

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.2-2025-12-11

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 80.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

49.7%45.3% - 54.2%

#12 · GPT-5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 78.4%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

49.6%45.5% - 53.7%

#13 · Claude Opus 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-5-20251101-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 76.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

49.2%45.2% - 53.1%

#14 · Claude Opus 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-6-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 74.5%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

49.1%45% - 53.2%

#15 · GPT-5.5

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.5

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 72.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

49.1%44.8% - 53.4%

#16 · Gemini 3.1 Flash-Lite Preview

VALS-AI · Jun 17, 2026

Source label: google/gemini-3.1-flash-lite-preview

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 70.6%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

47.6%43.5% - 51.7%

#17 · o3

VALS-AI · Jun 17, 2026

Source label: openai/o3-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 68.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

47.3%43.1% - 51.5%

#18 · Claude Opus 4.1

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 66.7%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

47.2%43.2% - 51.3%

#19 · Claude Opus 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-opus-4-1-20250805-thinking

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 66.7%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic. Backfilled from Claude Opus 4.1 via approved benchmark identity mapping map-claude-opus-4-to-4-1.

47.2%43.2% - 51.3%

#20 · minimax-m3

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M3

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 62.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: MiniMax.

46.3%42.2% - 50.4%

#21 · Claude Sonnet 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-5-20250929-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 60.8%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

44.1%40.2% - 48.1%

#22 · GPT-5.4 mini

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5-mini-2025-08-07

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 58.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

43%39% - 47.1%

#23 · glm-5.1

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 56.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

41.6%37.4% - 45.8%

#24 · GPT-5.4

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-2026-03-05

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 54.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

41.3%37.1% - 45.5%

#25 · GPT-5.4 nano

VALS-AI · Jun 17, 2026

Source label: openai/gpt-5.4-nano-2026-03-17

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 52.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

41%36.6% - 45.5%

#26 · GLM-5.2 (max)

VALS-AI · Jun 17, 2026

Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 51%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

40.8%36.5% - 45%

#27 · Gemini 2.5 Flash

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-preview-09-2025

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 49%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

40.5%36.8% - 44.3%

#28 · deepseek-v4-pro

VALS-AI · Jun 17, 2026

Source label: deepseek/deepseek-v4-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 47.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: DeepSeek.

40.5%36.3% - 44.6%

#29 · kimi-k2.6

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.6

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 45.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Moonshot AI.

40.1%36.1% - 44.1%

#30 · kimi-k2.5-thinking

VALS-AI · Jun 17, 2026

Source label: kimi/kimi-k2.5-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 43.1%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Moonshot AI.

39.3%35.2% - 43.5%

#31 · Qwen3.7 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.7-max

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 41.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

38.8%34.4% - 43.1%

#32 · Nemotron 3 Ultra 550B A55B (Reasoning)

VALS-AI · Jun 17, 2026

Source label: nvidia/nemotron-3-ultra-550b-a55b

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 39.2%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Nvidia.

38.6%34.7% - 42.5%

#33 · Grok 4

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-0709

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 37.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

38.1%33.8% - 42.4%

#34 · Grok 4.3

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.3

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 35.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

38.1%34% - 42.1%

#35 · Grok 4 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-fast-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 33.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

37.4%33.6% - 41.2%

#36 · qwen3.6-plus

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.6-plus

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 31.4%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

36.9%32.9% - 40.8%

#37 · Claude Sonnet 4

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 29.4%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

35%31.2% - 38.8%

#38 · Claude Sonnet 4.6

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-sonnet-4-20250514-thinking

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 29.4%
Last updated: recent
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic. Backfilled from Claude Sonnet 4 via approved benchmark identity mapping map-claude-sonnet-4-6-to-4.

35%31.2% - 38.8%

#39 · minimax-m2.7

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 25.5%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: MiniMax.

34.4%30.5% - 38.3%

#40 · Gemini 2.5 Flash-Lite

VALS-AI · Jun 17, 2026

Source label: google/gemini-2.5-flash-lite-preview-09-2025-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 23.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Google.

34.2%30.8% - 37.6%

#41 · MiniMax-M2.1

VALS-AI · Jun 17, 2026

Source label: minimax/MiniMax-M2.1

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 21.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: MiniMax.

34.1%30.3% - 37.9%

#42 · o4 mini

VALS-AI · Jun 17, 2026

Source label: openai/o4-mini-2025-04-16

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 19.6%
Last updated: recent
Eligibility: historical_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: OpenAI.

33.8%29.8% - 37.8%

#43 · mistral-medium-3.5

VALS-AI · Jun 17, 2026

Source label: mistralai/mistral-medium-3.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 17.6%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Mistral AI.

33.8%31.5% - 36%

#44 · Qwen3.5 Flash

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3.5-flash

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 15.7%
Last updated: recent
Eligibility: preview_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

33%29.5% - 36.5%

#45 · glm-4.7

VALS-AI · Jun 17, 2026

Source label: zai/glm-4.7

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 13.7%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

32.8%28.9% - 36.7%

#46 · Claude Haiku 4.5

VALS-AI · Jun 17, 2026

Source label: anthropic/claude-haiku-4-5-20251001-thinking

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 11.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Anthropic.

32.7%28.8% - 36.6%

#47 · mimo-v2.5-pro

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5-pro

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 9.8%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Xiaomi.

32.5%28.7% - 36.2%

#48 · Grok 4.20

VALS-AI · Jun 17, 2026

Source label: grok/grok-4.20-0309-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 7.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

32.2%28% - 36.3%

#49 · mimo-v2.5

VALS-AI · Jun 17, 2026

Source label: xiaomi/mimo-v2.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 5.9%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Xiaomi.

31.9%27.9% - 35.9%

#50 · Qwen3 Max

VALS-AI · Jun 17, 2026

Source label: alibaba/qwen3-max-2026-01-23

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 3.9%
Last updated: recent
Eligibility: preview_model
Identity: exact (1.00)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.

31.4%27.7% - 35.1%

#51 · Grok 4.1 Fast

VALS-AI · Jun 17, 2026

Source label: grok/grok-4-1-fast-non-reasoning

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: xAI.

28.3%24.6% - 32.1%

#52 · llama-4-scout-17b-16e-instruct

VALS-AI · Jun 17, 2026

Source label: together/meta-llama/Llama-4-Scout-17B-16E-Instruct

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://www.vals.ai/benchmarks/medcode
Percentile: 0%
Last updated: recent
Eligibility: benchmark_derived_model
Identity: provider alias (0.92)

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Together AI.

23.3%19.9% - 26.7%

MedCode

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

MedCode

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version