Benchmarks · /benchmarks/provider-mrcr-v2

MRCR v2

Name: MRCR v2
Creator: Provider official evals

Provider-official MRCR v2 long-context results manually verified from public provider pages.

Source · Provider official evals
Version · 2026-Q2 public provider cards
Scores · 2

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Provider official evals

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

provider_official_mrcr_v2_2026_q2

domain

Long context

What it measures vs what it misses

✓ Measures

Needle-style long-context recall and sustained retrieval under long windows.

✗ Misses

Real workflow synthesis quality and multi-document judgment.

Why this countsIt checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not guarantee good synthesis quality once real documents, tools, and latency constraints are involved.

Leaderboard · this benchmark version

#1 · Gemini 3.1 Pro

OFF · Apr 29, 2026

Official company resultmanual verifiedmanual verified

Raw row drilldownsource, percentile, eligibility

Source URL: https://deepmind.google/models/gemini/pro/
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)

Verified against the current Gemini Pro public model page. Page did not expose a visible page date on April 29, 2026. Uses the published MRCR v2 8-needle 128k average for Gemini 3.1 Pro. Checked 2026-04-29. Verification: manual_public_page_verification.

84.9%

#2 · Gemini 3 Pro Preview

OFF · Apr 29, 2026

Official company resultmanual verifiedmanual verifiedBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://deepmind.google/models/gemini/pro/
Percentile: 0%
Last updated: recent
Eligibility: preview_model
Identity: exact (1.00)

Verified against the current Gemini Pro public model page. Page did not expose a visible page date on April 29, 2026. Uses the published MRCR v2 8-needle 128k average for Gemini 3 Pro Preview. Checked 2026-04-29. Verification: manual_public_page_verification.

77%

Benchmarks · /benchmarks/provider-mrcr-v2

MRCR v2

Provider-official MRCR v2 long-context results manually verified from public provider pages.

Source · Provider official evals
Version · 2026-Q2 public provider cards
Scores · 2

Test details

Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.

source

Provider official evals

metric

Accuracy (%)

judge

Objective

direction

higher better

group id

provider_official_mrcr_v2_2026_q2

domain

Long context

What it measures vs what it misses

✓ Measures

Needle-style long-context recall and sustained retrieval under long windows.

✗ Misses

Real workflow synthesis quality and multi-document judgment.

Leaderboard · this benchmark version

#1 · Gemini 3.1 Pro

OFF · Apr 29, 2026

Official company resultmanual verifiedmanual verified

Raw row drilldownsource, percentile, eligibility

Source URL: https://deepmind.google/models/gemini/pro/
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)

84.9%

#2 · Gemini 3 Pro Preview

OFF · Apr 29, 2026

Official company resultmanual verifiedmanual verifiedBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://deepmind.google/models/gemini/pro/
Percentile: 0%
Last updated: recent
Eligibility: preview_model
Identity: exact (1.00)

77%

MRCR v2

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

MRCR v2

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version