Claude Opus 4.7
Anthropic
- Professional reasoning
Anthropic
1 shared benchmarks are still too close to call, so the win stays conditional. This compare uses all public sources, with provider-official evidence labeled separately.
Gemini 3.1 Pro has the cleanest edge here.
Claude Opus 4.7 has the cleanest edge here.
Claude Opus 4.7 has the cleanest edge here.
| BrowseComp OFF · % Search · Search / tool use | 79.3%33.3% Officialmanual verifiedmanual verified Row details
| 85.9%100% Officialmanual verifiedmanual verified Row details
| 66.7% spread |
| HiL-Bench SL · % Code · Coding | 27.7%80% exact directverified runtime Row details
| 20.3%40% exact directverified runtime Row details
| 40% spread |
| SWE-Bench Verified OFF · % Code · Coding | 87.6%100% Officialmanual verifiedmanual verified Row details
| 80.6%60% Officialmanual verifiedmanual verified Row details
| 40% spread |
| Terminal-Bench 2.0 OFF · % Code · Coding | 69.4%66.7% Officialmanual verifiedmanual verified Row details
| 68.5%50% Officialmanual verifiedmanual verified Row details
| 16.7% spread |
| Humanity's Last Exam OFF · % Text · Reasoning / math / science | 46.9%100% Officialmanual verifiedmanual verified Row details
| 44.4%85.7% Officialmanual verifiedmanual verified Row details
| 14.3% spread |
| Search Arena AR · rating Search · Search / tool use | 1,21190% exact aliasverified runtime Row details
| 1,21086.7% exact aliasverified runtime Row details
| 3.3% spread |