GPT-5.5
OpenAI
- Professional reasoning
- Multilingual
OpenAI
0 shared benchmarks are still too close to call, so the win stays conditional. This compare uses all public sources, with provider-official evidence labeled separately.
GPT-5.5 has the cleanest edge here.
GPT-5.5 has the cleanest edge here.
GPT-5.5 has the cleanest edge here.
| HiL-Bench SL · % Code · Coding | 29.1%100% exact aliasverified runtime Row details
| 20.3%40% exact directverified runtime Row details
| 60% spread |
| MMMU-Pro OFF · % Vision · Vision understanding | 81.2%100% Officialmanual verifiedmanual verified Row details
| 80.5%40% Officialmanual verifiedmanual verified Row details
| 60% spread |
| Terminal-Bench 2.0 OFF · % Code · Coding | 82.7%100% Officialmanual verifiedmanual verified Row details
| 68.5%50% Officialmanual verifiedmanual verified Row details
| 50% spread |
| BrowseComp OFF · % Search · Search / tool use | 84.4%83.3% Officialmanual verifiedmanual verified Row details
| 85.9%100% Officialmanual verifiedmanual verified Row details
| 16.7% spread |
| Humanity's Last Exam OFF · % Text · Reasoning / math / science | 41.4%71.4% Officialmanual verifiedmanual verified Row details
| 44.4%85.7% Officialmanual verifiedmanual verified Row details
| 14.3% spread |
| Search Arena AR · rating Search · Search / tool use | 1,22596.7% exact aliasverified runtime Row details
| 1,21086.7% exact aliasverified runtime Row details
| 10% spread |