TERMINAL-BENCH · benchmark platform
Terminal-Bench
Agent benchmark for hard, realistic multi-step tasks completed inside terminal environments.
verification status
verified
Last checked May 13, 2026
Evidence ledger
ModalitiescodeCadencerelease-basedAPInot publicEvaluations31VerificationverifiedVerified runtime28Manual verified0Relay / mirrored0Backfilled3
Relay sources mirror another provider's public page; manual rows are checked against the cited page; backfilled rows are historical inserts; seeded rows are demo fixtures. Relay rows are supporting evidence, not first-party measurements.
Operational state
snapshot
Latest pull
jsonMay 13, 2026
parser
Loaded 28 Terminal-Bench 2.0 benchmark records from verified rows.
ok0.1.0
verify
terminal-bench verification finished with status verified.
verifiedMay 13, 2026
Benchmarks from this source
Terminal-Bench 2.0
Agentic terminal coding
Accuracy
Latest change explanation
terminal-bench matched terminal-bench-20260513T010704Z with no notable change causes detected.