Source · Vals AI Version · vals-ai snapshot 2026-06-24 Scores · 89
Test details
Visible tradeoffsThis is an objective signal, so it is mainly about measurable task performance rather than public taste.
source
Vals AI
metric
Accuracy (%)
judge
Objective
direction
higher better
group id
vals_corp_fin_v2_current
domain
Long context
What it measures vs what it misses
✓ Measures
Understanding and synthesizing long credit agreements.
✗ Misses
Adjacent skills outside the benchmark task mix, latency, and cost.
Why this countsIt checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesIt does not guarantee good synthesis quality once real documents, tools, and latency constraints are involved.