UABUnbiased AI BenchAI model rankings with source links.
Every score links back to its source.
Home/Benchmarks/Document Arena
Document Arena
Live · updated continuously
Benchmarks · /benchmarks/arena-document

Document Arena

Blind preference arena for document-heavy prompts, PDFs, and long-form file understanding.
Source · Arena
Version · arena snapshot 2026-05-13
Scores · 20

Passport

Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Arena
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
arena_document_2026_q2
domain
Document understanding

What it measures vs what it misses

✓ Measures

Observed user preference on document-grounded outputs. How often a model wins when users compare file-aware responses head to head.

✗ Misses

Objective document extraction accuracy. Fine-grained breakdowns across OCR, tables, charts, and legal or financial workflows.

Why this countsIt matters when the job is reading PDFs, tables, forms, or mixed-layout documents rather than plain chat.Comparable-group ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not stand in for end-to-end document workflow quality or search relevance.

Leaderboard · this benchmark version

#1 · Claude Opus 4.6
AR · May 13, 2026
1,513
#2 · Claude Opus 4.7
AR · May 13, 2026
1,509
#3 · Claude Sonnet 4.6
AR · May 13, 2026
1,495
#4 · GPT-5.5
AR · May 13, 2026
1,492
#5 · GPT-5.4
AR · May 13, 2026
1,474
#6 · Claude Opus 4.5
AR · May 13, 2026
1,466
#7 · kimi-k2.6
AR · May 13, 2026
1,454
#8 · muse-spark
AR · May 13, 2026
1,452
#9 · Claude Sonnet 4.5
AR · May 13, 2026
1,450
#10 · Gemini 3.1 Pro Preview
AR · May 13, 2026
1,443
#11 · Gemini 3 Pro Preview
AR · May 13, 2026
1,439
#12 · kimi-k2.5-thinking
AR · May 13, 2026
1,437
#13 · Gemini 2.5 Pro
AR · May 13, 2026
1,427
#14 · gemma-4-31b
AR · May 13, 2026
1,424
#15 · Claude Haiku 4.5
AR · May 13, 2026
1,423
#16 · Grok 4.20
AR · May 13, 2026
1,420
#17 · Gemini 3 Flash
AR · May 13, 2026
1,418
#18 · GPT-5.2
AR · May 13, 2026
1,407
#19 · gpt-5.5-instant
AR · May 13, 2026
1,407
#20 · GPT-5.1
AR · May 13, 2026
1,407