Model vs model

Claude Opus 4.7 vs GPT-5.5

A debate-ready pair page: current winner, strongest alternative, decisive benchmarks, and the warning that should travel with the claim.

Use case · Coding copilot
Winner · Claude Opus 4.7
Sources · All public sources

Winner

Claude Opus 4.7

Anthropic

30benchmarks won

Professional reasoning

Versus · Coding copilot

Claude Opus 4.7 leads this compare set for coding copilot.

Claude Opus 4.730 of 51 wins

Visible tradeoffs

17 shared benchmarks are still too close to call, so the win stays conditional. This compare uses all public sources, with provider-official evidence labeled separately.

Close calls: 17
Sources: All public

Full compare workspace Disagreement page

59%win share

Challenger

GPT-5.5

OpenAI

21benchmarks won

Professional reasoning
Multilingual

The cases in full

Claude Opus 4.7 case

Professional reasoning

GPT-5.5 case

Professional reasoning
Multilingual

What changes the outcome

Claude Opus 4.7: 57 visible benchmark gaps still leave room for the result to move.
GPT-5.5: 24 visible benchmark gaps still leave room for the result to move.

Why this result is surprising

17 shared benchmarks are still close, so the winner is narrower than it looks.
AA-Omniscience non-hallucination is doing a lot of the visible work in the public narrative.

Why this is not a clean win

17 shared benchmarks are still too close to call, so the win stays conditional. This compare uses all public sources, with provider-official evidence labeled separately.
GPT-5.5 remains the strongest alternative once you change use case, mode, or missing-evidence assumptions.

Open full compare workspace Open evidence page Open disagreement page

Decisive benchmarks

bench

AA-Omniscience non-hallucination

Claude Opus 4.7 has the cleanest edge here.

bench

BrowseComp

GPT-5.5 has the cleanest edge here.

bench

Time to first answer token

Claude Opus 4.7 has the cleanest edge here.

52 of 114 benchmarks


AA-Omniscience non-hallucination AA · % Text · Chat / text	48.1%87.6% exact aliasverified runtime Row details Raw value 48.1% Percentile 87.6% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label Claude Opus 4.7 (Non-reasoning, High Effort) Source row	8.8%22.1% exact aliasverified runtime Row details Raw value 8.8% Percentile 22.1% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label GPT-5.5 (Non-reasoning) Source row	65.4% spread
BrowseComp OFF · % Search · Search / tool use	79.3%33.3% Officialmanual verifiedmanual verified Row details Raw value 79.3% Percentile 33.3% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label claude-opus-4-7 Source row	84.4%83.3% Officialmanual verifiedmanual verified Row details Raw value 84.4% Percentile 83.3% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label gpt-5-5 Source row	50% spread
Time to first answer token AA · s Text · Chat / text	13.81s48.6% exact aliasverified runtime Row details Raw value 13.81s Percentile 48.6% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Source row	107.59s1.9% exact aliasverified runtime Row details Raw value 107.59s Percentile 1.9% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label GPT-5.5 (xhigh) Source row	46.7% spread
Agentic Index AA · index Code · Coding	4495.7% exact aliasverified runtime Row details Raw value 44 Percentile 95.7% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Source row	2652.2% exact aliasverified runtime Row details Raw value 26 Percentile 52.2% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label GPT-5.5 (Non-reasoning) Source row	43.5% spread
GDPval-AA AA · rating Text · Professional reasoning	1,50793.5% exact aliasverified runtime Row details Raw value 1,507 Percentile 93.5% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label Claude Opus 4.7 (Adaptive Reasoning, Max Effort) Source row	1,12354.3% exact aliasverified runtime Row details Raw value 1,123 Percentile 54.3% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label GPT-5.5 (Non-reasoning) Source row	39.1% spread
Terminal-Bench 2.0 OFF · % Code · Coding	69.4%66.7% Officialmanual verifiedmanual verified Row details Raw value 69.4% Percentile 66.7% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label claude-opus-4-7 Source row	82.7%100% Officialmanual verifiedmanual verified Row details Raw value 82.7% Percentile 100% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label gpt-5-5 Source row	33.3% spread
Humanity's Last Exam OFF · % Text · Reasoning / math / science	46.9%100% Officialmanual verifiedmanual verified Row details Raw value 46.9% Percentile 100% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label claude-opus-4-7 Source row	41.4%71.4% Officialmanual verifiedmanual verified Row details Raw value 41.4% Percentile 71.4% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label gpt-5-5 Source row	28.6% spread
MedScribe VALS-AI · % Text · Professional reasoning	83%70% exact aliasverified runtime Row details Raw value 83% Percentile 70% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label anthropic/claude-opus-4-7 Source row	86.9%94% exact aliasverified runtime Row details Raw value 86.9% Percentile 94% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label openai/gpt-5.5 Source row	24% spread
Document Arena AR · rating Document · Document understanding	1,49795.8% exact aliasverified runtime Row details Raw value 1,497 Percentile 95.8% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label claude-opus-4-7-thinking Source row	1,47775% exact aliasverified runtime Row details Raw value 1,477 Percentile 75% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label gpt-5.5-high Source row	20.8% spread
GPQA AA · % Text · Reasoning / math / science	88.5%96.8% exact aliasverified runtime Row details Raw value 88.5% Percentile 96.8% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label Claude Opus 4.7 (Non-reasoning, High Effort) Source row	76.8%76.7% exact aliasverified runtime Row details Raw value 76.8% Percentile 76.7% Last updated recent Eligibility headline eligible Identity provider alias (0.94) Source label GPT-5.5 (Non-reasoning) Source row	20.1% spread
ProgramBench VALS-AI · % Code · Coding	0%70% exact aliasverified runtime Row details Raw value 0% Percentile 70% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label anthropic/claude-opus-4-7 Source row	0.5%90% exact aliasverified runtime Row details Raw value 0.5% Percentile 90% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label openai/gpt-5.5 Source row	20% spread
HiL-Bench SL · % Code · Coding	27.7%80% exact directverified runtime Row details Raw value 27.7% Percentile 80% Last updated recent Eligibility headline eligible Identity exact (1.00) Source label claude-opus-4-7 Source row	29.1%100% exact aliasverified runtime Row details Raw value 29.1% Percentile 100% Last updated recent Eligibility headline eligible Identity provider alias (0.92) Source label GPT-5.5 Source row	20% spread

1-12 of 52

Page 1 of 5Page size

AA-Omniscience non-hallucination

AA · %

Text · Chat / text

48.1%87.6%

exact aliasverified runtime

Row details

Raw value: 48.1%
Percentile: 87.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: Claude Opus 4.7 (Non-reasoning, High Effort)

Source row

8.8%22.1%

exact aliasverified runtime

Row details

Raw value: 8.8%
Percentile: 22.1%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: GPT-5.5 (Non-reasoning)

Source row

65.4% spread

BrowseComp

OFF · %

Search · Search / tool use

79.3%33.3%

Officialmanual verifiedmanual verified

Row details

Raw value: 79.3%
Percentile: 33.3%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: claude-opus-4-7

Source row

84.4%83.3%

Officialmanual verifiedmanual verified

Row details

Raw value: 84.4%
Percentile: 83.3%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: gpt-5-5

Source row

50% spread

Time to first answer token

AA · s

Text · Chat / text

13.81s48.6%

exact aliasverified runtime

Row details

Raw value: 13.81s
Percentile: 48.6%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: Claude Opus 4.7 (Adaptive Reasoning, Max Effort)

Source row

107.59s1.9%

exact aliasverified runtime

Row details

Raw value: 107.59s
Percentile: 1.9%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: GPT-5.5 (xhigh)

Source row

46.7% spread

Agentic Index

AA · index

Code · Coding

4495.7%

exact aliasverified runtime

Row details

Raw value: 44
Percentile: 95.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: Claude Opus 4.7 (Adaptive Reasoning, Max Effort)

Source row

2652.2%

exact aliasverified runtime

Row details

Raw value: 26
Percentile: 52.2%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: GPT-5.5 (Non-reasoning)

Source row

43.5% spread

GDPval-AA

AA · rating

Text · Professional reasoning

1,50793.5%

exact aliasverified runtime

Row details

Raw value: 1,507
Percentile: 93.5%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: Claude Opus 4.7 (Adaptive Reasoning, Max Effort)

Source row

1,12354.3%

exact aliasverified runtime

Row details

Raw value: 1,123
Percentile: 54.3%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: GPT-5.5 (Non-reasoning)

Source row

39.1% spread

Terminal-Bench 2.0

OFF · %

Code · Coding

69.4%66.7%

Officialmanual verifiedmanual verified

Row details

Raw value: 69.4%
Percentile: 66.7%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: claude-opus-4-7

Source row

82.7%100%

Officialmanual verifiedmanual verified

Row details

Raw value: 82.7%
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: gpt-5-5

Source row

33.3% spread

Humanity's Last Exam

OFF · %

Text · Reasoning / math / science

46.9%100%

Officialmanual verifiedmanual verified

Row details

Raw value: 46.9%
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: claude-opus-4-7

Source row

41.4%71.4%

Officialmanual verifiedmanual verified

Row details

Raw value: 41.4%
Percentile: 71.4%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: gpt-5-5

Source row

28.6% spread

MedScribe

VALS-AI · %

Text · Professional reasoning

83%70%

exact aliasverified runtime

Row details

Raw value: 83%
Percentile: 70%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: anthropic/claude-opus-4-7

Source row

86.9%94%

exact aliasverified runtime

Row details

Raw value: 86.9%
Percentile: 94%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: openai/gpt-5.5

Source row

24% spread

Document Arena

AR · rating

Document · Document understanding

1,49795.8%

exact aliasverified runtime

Row details

Raw value: 1,497
Percentile: 95.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: claude-opus-4-7-thinking

Source row

1,47775%

exact aliasverified runtime

Row details

Raw value: 1,477
Percentile: 75%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: gpt-5.5-high

Source row

20.8% spread

GPQA

AA · %

Text · Reasoning / math / science

88.5%96.8%

exact aliasverified runtime

Row details

Raw value: 88.5%
Percentile: 96.8%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: Claude Opus 4.7 (Non-reasoning, High Effort)

Source row

76.8%76.7%

exact aliasverified runtime

Row details

Raw value: 76.8%
Percentile: 76.7%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.94)
Source label: GPT-5.5 (Non-reasoning)

Source row

20.1% spread

ProgramBench

VALS-AI · %

Code · Coding

0%70%

exact aliasverified runtime

Row details

Raw value: 0%
Percentile: 70%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: anthropic/claude-opus-4-7

Source row

0.5%90%

exact aliasverified runtime

Row details

Raw value: 0.5%
Percentile: 90%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: openai/gpt-5.5

Source row

20% spread

HiL-Bench

SL · %

Code · Coding

27.7%80%

exact directverified runtime

Row details

Raw value: 27.7%
Percentile: 80%
Last updated: recent
Eligibility: headline eligible
Identity: exact (1.00)
Source label: claude-opus-4-7

Source row

29.1%100%

exact aliasverified runtime

Row details

Raw value: 29.1%
Percentile: 100%
Last updated: recent
Eligibility: headline eligible
Identity: provider alias (0.92)
Source label: GPT-5.5

Source row

20% spread

Claude Opus 4.7 vs GPT-5.5

Claude Opus 4.7

Claude Opus 4.7 leads this compare set for coding copilot.

GPT-5.5

Claude Opus 4.7 case

GPT-5.5 case

What changes the outcome

Why this result is surprising

Why this is not a clean win

Share the recommendation with the source data attached.

Open or copy the stable surfaces

Use the exact public framing

Pick the voice before you post

Compose a post that keeps the warning attached

Decisive benchmarks

Loading model comparison.

Claude Opus 4.7 vs GPT-5.5

Claude Opus 4.7

Claude Opus 4.7 leads this compare set for coding copilot.

GPT-5.5

Claude Opus 4.7 case

GPT-5.5 case

What changes the outcome

Why this result is surprising

Why this is not a clean win

Share the recommendation with the source data attached.

Open or copy the stable surfaces

Use the exact public framing

Pick the voice before you post

Compose a post that keeps the warning attached

Decisive benchmarks