Benchmarks · /benchmarks/mteb-retrieval-en-v2

Retrieval

Name: Retrieval
Creator: MTEB

MTEB retrieval slice for embeddings.

Source · MTEB
Version · mteb snapshot 2026-06-24
Scores · 11

Test details

Visible tradeoffsThis is a retrieval signal, so it is best read as search-stack quality rather than broad model capability.

source

MTEB

metric

NDCG@10 (ndcg)

judge

Retrieval

direction

higher better

group id

mteb_retrieval_en_v2

domain

Embeddings / retrieval

What it measures vs what it misses

✓ Measures

Embedding quality for retrieval tasks.

✗ Misses

Chat quality, generation, latency.

Why this countsIt is one of the few direct signals for retrieval stacks, where embedding quality matters more than chat style.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not tell you whether the same model is strong at generation, ranking policy, or final answer quality.

Leaderboard · this benchmark version

#1 · GPT-5

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

Embedding endpoint score.

58.8 ndcg

#2 · GPT-5.4

MTEB · Mar 19, 2026

Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-to-gpt-5.

58.8 ndcg

#3 · GPT-5.4 mini

MTEB · Mar 19, 2026

Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-mini-to-gpt-5.

58.8 ndcg

#4 · GPT-5.4 nano

MTEB · Mar 19, 2026

Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

58.8 ndcg

#5 · Gemini 2.5 Pro

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 60%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

57.9 ndcg

#6 · Qwen3 235B A22B

MTEB · Mar 19, 2026

Source label: qwen3-235b

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 50%
Last updated: aging
Eligibility: headline eligible
Identity: provider alias (0.92)

Strong open retrieval score.

56.2 ndcg

#7 · Grok 4

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 40%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

52.1 ndcg

#8 · Mistral Medium 3

MTEB · Mar 19, 2026

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 30%
Last updated: aging
Eligibility: historical_model
Identity: exact (1.00)

51.3 ndcg

#9 · Llama 4 Maverick

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 20%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

49.6 ndcg

#10 · BAAI bge-large-en-v1.5

MTEB · May 13, 2026

Source label: BAAI/bge-large-en-v1.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 10%
Last updated: recent
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from official MTEB result files under `BAAI__bge-large-en-v1.5`.

49.3 ndcg

#11 · deepseek-r1-0528

MTEB · Mar 19, 2026

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 0%
Last updated: aging
Eligibility: benchmark_derived_model
Identity: exact (1.00)

48.4 ndcg

Benchmarks · /benchmarks/mteb-retrieval-en-v2

Retrieval

MTEB retrieval slice for embeddings.

Source · MTEB
Version · mteb snapshot 2026-06-24
Scores · 11

Test details

Visible tradeoffsThis is a retrieval signal, so it is best read as search-stack quality rather than broad model capability.

source

MTEB

metric

NDCG@10 (ndcg)

judge

Retrieval

direction

higher better

group id

mteb_retrieval_en_v2

domain

Embeddings / retrieval

What it measures vs what it misses

✓ Measures

Embedding quality for retrieval tasks.

✗ Misses

Chat quality, generation, latency.

Leaderboard · this benchmark version

#1 · GPT-5

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

Embedding endpoint score.

58.8 ndcg

#2 · GPT-5.4

MTEB · Mar 19, 2026

Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-to-gpt-5.

58.8 ndcg

#3 · GPT-5.4 mini

MTEB · Mar 19, 2026

Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-mini-to-gpt-5.

58.8 ndcg

#4 · GPT-5.4 nano

MTEB · Mar 19, 2026

Source label: gpt-5

backfilledproxy backfilledBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 100%
Last updated: aging
Eligibility: Fallback benchmark identity is visible for context but excluded from default ranking.
Identity: benchmark proxy (0.58)

Embedding endpoint score. Backfilled from GPT-5 via approved benchmark identity mapping map-gpt-5-4-nano-to-gpt-5.

58.8 ndcg

#5 · Gemini 2.5 Pro

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 60%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

57.9 ndcg

#6 · Qwen3 235B A22B

MTEB · Mar 19, 2026

Source label: qwen3-235b

verified runtimeexact alias

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 50%
Last updated: aging
Eligibility: headline eligible
Identity: provider alias (0.92)

Strong open retrieval score.

56.2 ndcg

#7 · Grok 4

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 40%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

52.1 ndcg

#8 · Mistral Medium 3

MTEB · Mar 19, 2026

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 30%
Last updated: aging
Eligibility: historical_model
Identity: exact (1.00)

51.3 ndcg

#9 · Llama 4 Maverick

MTEB · Mar 19, 2026

verified runtimeexact direct

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 20%
Last updated: aging
Eligibility: headline eligible
Identity: exact (1.00)

49.6 ndcg

#10 · BAAI bge-large-en-v1.5

MTEB · May 13, 2026

Source label: BAAI/bge-large-en-v1.5

verified runtimeexact aliasBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 10%
Last updated: recent
Eligibility: specialized_model
Identity: provider alias (0.92)

Derived from official MTEB result files under `BAAI__bge-large-en-v1.5`.

49.3 ndcg

#11 · deepseek-r1-0528

MTEB · Mar 19, 2026

verified runtimeexact directBackground only

Raw row drilldownsource, percentile, eligibility

Source URL: https://embeddings-benchmark.github.io/mteb/
Percentile: 0%
Last updated: aging
Eligibility: benchmark_derived_model
Identity: exact (1.00)

48.4 ndcg

Retrieval

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version

Loading benchmark evidence.

Retrieval

Test details

What it measures vs what it misses

✓ Measures

✗ Misses

Leaderboard · this benchmark version