Model profile · Zhipu

GLM-5.2 (max)

Closed weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 50.7%
Verified coverage: 50.7%
Spread: n/a
Last verified: Jun 20, 2026

textcodedocument3 aliases2 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text38 benchmarks80%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #3 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 51
Percentile: 99.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

99.5% percentile inside its fair comparison set

51Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #58 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25.1%
Percentile: 80.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

80.9% percentile inside its fair comparison set

25.1%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #9 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 71.9%
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

97.3% percentile inside its fair comparison set

71.9%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #17 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 73.3%
Percentile: 94.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `ifbench`.

94.9% percentile inside its fair comparison set

73.3%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #209 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $2.2 /1M tokens
Percentile: 25%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

25% percentile inside its fair comparison set

$2.2 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #218 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.4 /1M input tokens
Percentile: 21.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

21.7% percentile inside its fair comparison set

$1.4 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #202 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $4.4 /1M output tokens
Percentile: 27.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

27.5% percentile inside its fair comparison set

$4.4 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #70 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 131.3 tokens/s
Percentile: 67.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

67.1% percentile inside its fair comparison set

131.3 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 1.48s
Percentile: 70.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

70.5% percentile inside its fair comparison set

1.48sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #120 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 16.71s
Percentile: 43.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

43.3% percentile inside its fair comparison set

16.71sRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #65 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 44
Percentile: 71%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

71% percentile inside its fair comparison set

44Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #19 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,471
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: overall. Source rank: #25. Votes: 3357. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,471Raw benchmark valueCI 1,461 - 1,481

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #21 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 93.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: creative_writing. Source rank: #29. Votes: 623. Organization: zai. License: MIT.

93.8% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,421 - 1,469

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,481
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: english. Source rank: #17. Votes: 1602. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,481Raw benchmark valueCI 1,466 - 1,496

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #19 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,479
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: exclude_ties. Source rank: #25. Votes: 2479. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,479Raw benchmark valueCI 1,465 - 1,493

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #14 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 96%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts. Source rank: #17. Votes: 2148. Organization: zai. License: MIT.

96% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,484 - 1,510

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #13 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,501
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts_english. Source rank: #15. Votes: 1074. Organization: zai. License: MIT.

96.3% percentile inside its fair comparison set

1,501Raw benchmark valueCI 1,483 - 1,519

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,468
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: instruction_following. Source rank: #19. Votes: 1122. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,468Raw benchmark valueCI 1,451 - 1,486

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,485
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: longer_query. Source rank: #19. Votes: 1468. Organization: zai. License: MIT.

95.4% percentile inside its fair comparison set

1,485Raw benchmark valueCI 1,470 - 1,501

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #28 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,470
Percentile: 91.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: multi_turn. Source rank: #37. Votes: 534. Organization: zai. License: MIT.

91.6% percentile inside its fair comparison set

1,470Raw benchmark valueCI 1,444 - 1,496

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #13 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: overall. Source rank: #16. Votes: 3357. Organization: zai. License: MIT.

96.3% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,456 - 1,477

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,456
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: creative_writing. Source rank: #14. Votes: 623. Organization: zai. License: MIT.

96.6% percentile inside its fair comparison set

1,456Raw benchmark valueCI 1,432 - 1,480

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,473
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: english. Source rank: #14. Votes: 1602. Organization: zai. License: MIT.

96.6% percentile inside its fair comparison set

1,473Raw benchmark valueCI 1,459 - 1,488

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,471
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: exclude_ties. Source rank: #15. Votes: 2479. Organization: zai. License: MIT.

96.6% percentile inside its fair comparison set

1,471Raw benchmark valueCI 1,457 - 1,485

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,480
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts. Source rank: #18. Votes: 2148. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,480Raw benchmark valueCI 1,467 - 1,493

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,482
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts_english. Source rank: #11. Votes: 1074. Organization: zai. License: MIT.

97.5% percentile inside its fair comparison set

1,482Raw benchmark valueCI 1,464 - 1,500

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,460
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: instruction_following. Source rank: #20. Votes: 1122. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,460Raw benchmark valueCI 1,442 - 1,478

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #16 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,475
Percentile: 95.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: longer_query. Source rank: #20. Votes: 1468. Organization: zai. License: MIT.

95.1% percentile inside its fair comparison set

1,475Raw benchmark valueCI 1,459 - 1,491

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #17 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 95%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: multi_turn. Source rank: #21. Votes: 534. Organization: zai. License: MIT.

95% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,441 - 1,493

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #29 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62.3%
Percentile: 74.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

74.1% percentile inside its fair comparison set

62.3%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #43 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76.2%
Percentile: 61.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

61.1% percentile inside its fair comparison set

76.2%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #22 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62.3%
Percentile: 80.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

80.6% percentile inside its fair comparison set

62.3%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #35 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 55.7%
Percentile: 68.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

68.5% percentile inside its fair comparison set

55.7%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #40 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62.9%
Percentile: 63.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

63.9% percentile inside its fair comparison set

62.9%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 68.4%
Percentile: 87%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

87% percentile inside its fair comparison set

68.4%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #40 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 94%
Percentile: 64.8%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

64.8% percentile inside its fair comparison set

94%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #27 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 60.7%
Percentile: 75.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

75.9% percentile inside its fair comparison set

60.7%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 74%
Percentile: 45.8%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

45.8% percentile inside its fair comparison set

74%Raw benchmark value

Coding23 benchmarks91.5%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #5 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 50.8%
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

98.7% percentile inside its fair comparison set

50.8%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #5 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 50.5%
Percentile: 98.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `scicode`.

98.9% percentile inside its fair comparison set

50.5%Raw benchmark value

Coding Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #7 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 69
Percentile: 92%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `codingIndex`.

92% percentile inside its fair comparison set

69Raw benchmark value

Agentic Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 43
Percentile: 93.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `agenticIndex`.

93.5% percentile inside its fair comparison set

43Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #2 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,593
Percentile: 98.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: overall. Source rank: #2. Votes: 1994. Organization: zai. License: MIT.

98.6% percentile inside its fair comparison set

1,593Raw benchmark valueCI 1,579 - 1,608

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #2 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,593
Percentile: 98.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: webdev. Source rank: #2. Votes: 1994. Organization: zai. License: MIT.

98.6% percentile inside its fair comparison set

1,593Raw benchmark valueCI 1,579 - 1,608

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #5 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,541
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: webdev-html. Source rank: #7. Votes: 282. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,541Raw benchmark valueCI 1,504 - 1,578

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #2 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,597
Percentile: 98.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: webdev-react. Source rank: #2. Votes: 1712. Organization: zai. License: MIT.

98.3% percentile inside its fair comparison set

1,597Raw benchmark valueCI 1,581 - 1,613

Code Migration

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #6 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 37.9%
Percentile: 77.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: code-migration; provider: Zhipu AI.

77.3% percentile inside its fair comparison set

37.9%Raw benchmark valueCI 29.7% - 46%

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #61 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 69.5%
Percentile: 33.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: Zhipu AI.

33.3% percentile inside its fair comparison set

69.5%Raw benchmark valueCI 67.2% - 71.8%

ProgramBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 0.5%
Percentile: 90%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: programbench; provider: Zhipu AI.

90% percentile inside its fair comparison set

0.5%Raw benchmark valueCI 0% - 1.5%

SWE-bench Verified

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 82.8%
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: swebench; provider: Zhipu AI.

96.3% percentile inside its fair comparison set

82.8%Raw benchmark valueCI 79.5% - 86.1%

Terminal-Bench 2.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #7 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 67.8%
Percentile: 77.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: terminal-bench-2-1; provider: Zhipu AI.

77.8% percentile inside its fair comparison set

67.8%Raw benchmark valueCI 65.8% - 69.7%

Vibe Code Bench v1.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #6 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 64%
Percentile: 89.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vibe-code; provider: Zhipu AI.

89.8% percentile inside its fair comparison set

64%Raw benchmark valueCI 54.6% - 73.4%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,526
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: coding. Source rank: #11. Votes: 913. Organization: zai. License: MIT.

97.5% percentile inside its fair comparison set

1,526Raw benchmark valueCI 1,506 - 1,546

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #8 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 97.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: coding. Source rank: #12. Votes: 913. Organization: zai. License: MIT.

97.8% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,478 - 1,517

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #1 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 73.3%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

100% percentile inside its fair comparison set

73.3%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #11 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 79.7%
Percentile: 91.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

91.7% percentile inside its fair comparison set

79.7%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 65%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

100% percentile inside its fair comparison set

65%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 65%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

100% percentile inside its fair comparison set

65%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 90%
Percentile: 99.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

99.1% percentile inside its fair comparison set

90%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #35 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 78.9%
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

83.3% percentile inside its fair comparison set

78.9%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #8 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 80.4%
Percentile: 97.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

97.2% percentile inside its fair comparison set

80.4%Raw benchmark value

Reasoning / math / science16 benchmarks83.3%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #4 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 40.1%
Percentile: 99.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `hle`.

99.2% percentile inside its fair comparison set

40.1%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #9 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 89.5%
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `gpqa`.

98.1% percentile inside its fair comparison set

89.5%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #6 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 20.9%
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `critpt`.

98.7% percentile inside its fair comparison set

20.9%Raw benchmark value

ProofBench

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #9 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 35%
Percentile: 77.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: proof_bench; provider: Zhipu AI.

77.1% percentile inside its fair comparison set

35%Raw benchmark valueCI 25.6% - 44.4%

GPQA Diamond

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #27 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 85.6%
Percentile: 71.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: gpqa; provider: Zhipu AI.

71.9% percentile inside its fair comparison set

85.6%Raw benchmark valueCI 82.1% - 89.1%

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #23 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 86.7%
Percentile: 75.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

75.3% percentile inside its fair comparison set

86.7%Raw benchmark valueCI 86.1% - 87.4%

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #13 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 89.8%
Percentile: 88.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

88.9% percentile inside its fair comparison set

89.8%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #30 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 78.6%
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

73.1% percentile inside its fair comparison set

78.6%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #30 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 98%
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

94.4% percentile inside its fair comparison set

98%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #17 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76%
Percentile: 85.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

85.2% percentile inside its fair comparison set

76%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #11 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 96.1%
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

95.4% percentile inside its fair comparison set

96.1%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #21 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 89%
Percentile: 81.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

81.5% percentile inside its fair comparison set

89%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #33 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 75%
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

73.1% percentile inside its fair comparison set

75%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #41 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 65.5%
Percentile: 62.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

62.6% percentile inside its fair comparison set

65.5%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #54 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 94%
Percentile: 60.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

60.2% percentile inside its fair comparison set

94%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #5 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 80%
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

98.1% percentile inside its fair comparison set

80%Raw benchmark value

Professional reasoning33 benchmarks83.8%

GDPval-AA

AA · Professional reasoning · Rubric

Agentic performance on economically valuable work tasks.

Rank #3 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 1,521
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.

95.7% percentile inside its fair comparison set

1,521Raw benchmark value

Legal Research Bench

VALS-AI · Professional reasoning · Objective

Applied legal research tasks.

Rank #4 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 31.3%
Percentile: 75%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: Zhipu AI.

75% percentile inside its fair comparison set

31.3%Raw benchmark valueCI 24.9% - 37.6%

SkillsBench

VALS-AI · Professional reasoning · Objective

Applied professional skills tasks.

Rank #10 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 45.1%
Percentile: 10%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: Zhipu AI.

10% percentile inside its fair comparison set

45.1%Raw benchmark valueCI 36.5% - 53.7%

Vals Index

VALS-AI · Professional reasoning · Combined

Weighted model performance across economically relevant Vals tasks.

Rank #5 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 65
Percentile: 84.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Zhipu AI.

84.6% percentile inside its fair comparison set

65Raw benchmark valueCI 62 - 68

Harvey's Legal Agent Benchmark

VALS-AI · Professional reasoning · Objective

Completing legal work with documents, spreadsheets, presentations, and file-system tools.

Rank #3 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 7.1%
Percentile: 84.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: Zhipu AI.

84.6% percentile inside its fair comparison set

7.1%Raw benchmark valueCI 3.2% - 11%

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #23 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 84.1%
Percentile: 75.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Zhipu AI.

75.6% percentile inside its fair comparison set

84.1%Raw benchmark valueCI 83.2% - 85%

Finance Agent v2

VALS-AI · Professional reasoning · Objective

Core financial analyst tasks for agentic models.

Rank #7 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 49.7%
Percentile: 76%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Zhipu AI.

76% percentile inside its fair comparison set

49.7%Raw benchmark valueCI 48% - 51.4%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #32 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 73.3%
Percentile: 65.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Zhipu AI.

65.9% percentile inside its fair comparison set

73.3%Raw benchmark valueCI 71.6% - 75%

MedCode

VALS-AI · Professional reasoning · Objective

Medical billing support and coding tasks.

Rank #26 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 40.8%
Percentile: 51%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

51% percentile inside its fair comparison set

40.8%Raw benchmark valueCI 36.5% - 45%

MedScribe

VALS-AI · Professional reasoning · Objective

Administrative documentation support for doctors.

Rank #14 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.5%
Percentile: 74%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Zhipu AI.

74% percentile inside its fair comparison set

83.5%Raw benchmark valueCI 79.6% - 87.5%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #23 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 92%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: expert. Source rank: #29. Votes: 340. Organization: zai. License: MIT.

92% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,458 - 1,520

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #20 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,461
Percentile: 94%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_business_and_management_and_financial_operations. Source rank: #26. Votes: 621. Organization: zai. License: MIT.

94% percentile inside its fair comparison set

1,461Raw benchmark valueCI 1,438 - 1,485

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #21 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,440
Percentile: 93.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_entertainment_and_sports_and_media. Source rank: #29. Votes: 804. Organization: zai. License: MIT.

93.8% percentile inside its fair comparison set

1,440Raw benchmark valueCI 1,419 - 1,461

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #45 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,452
Percentile: 85.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_legal_and_government. Source rank: #59. Votes: 260. Organization: zai. License: MIT.

85.2% percentile inside its fair comparison set

1,452Raw benchmark valueCI 1,415 - 1,488

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #4 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,510
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_life_and_physical_and_social_science. Source rank: #5. Votes: 555. Organization: zai. License: MIT.

99.1% percentile inside its fair comparison set

1,510Raw benchmark valueCI 1,484 - 1,536

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #17 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,482
Percentile: 94.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_mathematical. Source rank: #20. Votes: 212. Organization: zai. License: MIT.

94.8% percentile inside its fair comparison set

1,482Raw benchmark valueCI 1,441 - 1,522

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,500
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_medicine_and_healthcare. Source rank: #11. Votes: 246. Organization: zai. License: MIT.

97.3% percentile inside its fair comparison set

1,500Raw benchmark valueCI 1,460 - 1,539

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #11 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,512
Percentile: 96.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_software_and_it_services. Source rank: #14. Votes: 1299. Organization: zai. License: MIT.

96.9% percentile inside its fair comparison set

1,512Raw benchmark valueCI 1,495 - 1,528

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_writing_and_literature_and_language. Source rank: #12. Votes: 827. Organization: zai. License: MIT.

97.5% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,446 - 1,488

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #22 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,476
Percentile: 92.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: expert. Source rank: #28. Votes: 340. Organization: zai. License: MIT.

92.4% percentile inside its fair comparison set

1,476Raw benchmark valueCI 1,445 - 1,507

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #19 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,448
Percentile: 94.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_business_and_management_and_financial_operations. Source rank: #24. Votes: 621. Organization: zai. License: MIT.

94.3% percentile inside its fair comparison set

1,448Raw benchmark valueCI 1,425 - 1,472

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #11 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 96.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_entertainment_and_sports_and_media. Source rank: #14. Votes: 804. Organization: zai. License: MIT.

96.9% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,424 - 1,466

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #35 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,447
Percentile: 88.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_legal_and_government. Source rank: #44. Votes: 260. Organization: zai. License: MIT.

88.6% percentile inside its fair comparison set

1,447Raw benchmark valueCI 1,412 - 1,483

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #3 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,502
Percentile: 99.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_life_and_physical_and_social_science. Source rank: #4. Votes: 555. Organization: zai. License: MIT.

99.4% percentile inside its fair comparison set

1,502Raw benchmark valueCI 1,476 - 1,528

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #18 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,474
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_mathematical. Source rank: #21. Votes: 212. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,474Raw benchmark valueCI 1,433 - 1,514

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #7 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,487
Percentile: 98%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_medicine_and_healthcare. Source rank: #9. Votes: 246. Organization: zai. License: MIT.

98% percentile inside its fair comparison set

1,487Raw benchmark valueCI 1,448 - 1,526

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #7 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,491
Percentile: 98.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_software_and_it_services. Source rank: #10. Votes: 1299. Organization: zai. License: MIT.

98.2% percentile inside its fair comparison set

1,491Raw benchmark valueCI 1,475 - 1,508

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #7 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,468
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_writing_and_literature_and_language. Source rank: #9. Votes: 827. Organization: zai. License: MIT.

98.1% percentile inside its fair comparison set

1,468Raw benchmark valueCI 1,447 - 1,489

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #24 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 73.7%
Percentile: 78.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

78.7% percentile inside its fair comparison set

73.7%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #7 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76.2%
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

94.4% percentile inside its fair comparison set

76.2%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #20 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 79.3%
Percentile: 82.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

82.4% percentile inside its fair comparison set

79.3%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #28 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 45.9%
Percentile: 75%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

75% percentile inside its fair comparison set

45.9%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #92 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 96.1%
Percentile: 30.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

30.6% percentile inside its fair comparison set

96.1%Raw benchmark value

Search / tool use1 benchmark100%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #2 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 99.1%
Percentile: 100%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `tau2`.

100% percentile inside its fair comparison set

99.1%Raw benchmark value

Long context2 benchmarks92.9%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #3 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 71.3%
Percentile: 99.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `lcr`.

99.4% percentile inside its fair comparison set

71.3%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #13 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 66.1%
Percentile: 86.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: Zhipu AI.

86.4% percentile inside its fair comparison set

66.1%Raw benchmark valueCI 64.3% - 67.9%

Multilingual2 benchmarks97.4%

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,487
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: russian. Source rank: #11. Votes: 338. Organization: zai. License: MIT.

97.2% percentile inside its fair comparison set

1,487Raw benchmark valueCI 1,455 - 1,518

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #8 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,484
Percentile: 97.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: russian. Source rank: #11. Votes: 338. Organization: zai. License: MIT.

97.6% percentile inside its fair comparison set

1,484Raw benchmark valueCI 1,452 - 1,515

Source links and registry checks

official

Artificial Analysis

Jun 20, 2026

source →

official

LiveBench

Jun 20, 2026

source →

Model profile · Zhipu

GLM-5.2 (max)

Closed weightsmid · registry tag 2026 benchmark-derived

Thin verified coverage

Reads as thin verified coverage across the resolved source data.

Visible coverage: 50.7%
Verified coverage: 50.7%
Spread: n/a
Last verified: Jun 20, 2026

textcodedocument3 aliases2 official source links

Open compare

Data version

Current snapshot.

Data version Jun 20, 2026Model list checked9 providers · 1081 tracked modelsPage refreshed Jul 5, 2026

The registry snapshot and page stamp are shown so a stale deploy is visible at a glance.

Source-linked scores by benchmark

Each row keeps the benchmark source, source type, raw metric, and percentile inside its fair comparison set.

Thin verified coverageThis model currently reads as thin verified coverage across the resolved source data.

Chat / text38 benchmarks80%

Intelligence Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #3 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 51
Percentile: 99.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `intelligenceIndex`.

99.5% percentile inside its fair comparison set

51Raw benchmark value

AA-Omniscience accuracy

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #58 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 25.1%
Percentile: 80.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `omniscienceAccuracy`.

80.9% percentile inside its fair comparison set

25.1%Raw benchmark value

AA-Omniscience non-hallucination

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #9 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 71.9%
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `omniscienceNonHallucination`.

97.3% percentile inside its fair comparison set

71.9%Raw benchmark value

IFBench

AA · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #17 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 73.3%
Percentile: 94.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `ifbench`.

94.9% percentile inside its fair comparison set

73.3%Raw benchmark value

Blended price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #209 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $2.2 /1M tokens
Percentile: 25%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `price1mBlended0To3To1`.

25% percentile inside its fair comparison set

$2.2 /1M tokensRaw benchmark value

Input price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #218 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $1.4 /1M input tokens
Percentile: 21.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `price1mInputTokens`.

21.7% percentile inside its fair comparison set

$1.4 /1M input tokensRaw benchmark value

Output price

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #202 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: $4.4 /1M output tokens
Percentile: 27.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `price1mOutputTokens`.

27.5% percentile inside its fair comparison set

$4.4 /1M output tokensRaw benchmark value

Output Speed

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #70 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 131.3 tokens/s
Percentile: 67.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `medianOutputTokensPerSecond`.

67.1% percentile inside its fair comparison set

131.3 tokens/sRaw benchmark value

Time to first token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 1.48s
Percentile: 70.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstTokenSeconds`.

70.5% percentile inside its fair comparison set

1.48sRaw benchmark value

Time to first answer token

AA · Chat / text · Speed / cost

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #120 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 16.71s
Percentile: 43.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `medianTimeToFirstAnswerTokenSeconds`.

43.3% percentile inside its fair comparison set

16.71sRaw benchmark value

Openness Index

AA · Chat / text · Combined

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #65 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 44
Percentile: 71%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `opennessBreakdown.opennessIndex`.

71% percentile inside its fair comparison set

44Raw benchmark value

Text Arena

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #19 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,471
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: overall. Source rank: #25. Votes: 3357. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,471Raw benchmark valueCI 1,461 - 1,481

Text Arena · Creative Writing

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #21 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 93.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: creative_writing. Source rank: #29. Votes: 623. Organization: zai. License: MIT.

93.8% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,421 - 1,469

Text Arena · English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,481
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: english. Source rank: #17. Votes: 1602. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,481Raw benchmark valueCI 1,466 - 1,496

Text Arena · Exclude Ties

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #19 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,479
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: exclude_ties. Source rank: #25. Votes: 2479. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,479Raw benchmark valueCI 1,465 - 1,493

Text Arena · Hard Prompts

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #14 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 96%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts. Source rank: #17. Votes: 2148. Organization: zai. License: MIT.

96% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,484 - 1,510

Text Arena · Hard Prompts English

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #13 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,501
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts_english. Source rank: #15. Votes: 1074. Organization: zai. License: MIT.

96.3% percentile inside its fair comparison set

1,501Raw benchmark valueCI 1,483 - 1,519

Text Arena · Instruction Following

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,468
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: instruction_following. Source rank: #19. Votes: 1122. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,468Raw benchmark valueCI 1,451 - 1,486

Text Arena · Longer Query

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,485
Percentile: 95.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: longer_query. Source rank: #19. Votes: 1468. Organization: zai. License: MIT.

95.4% percentile inside its fair comparison set

1,485Raw benchmark valueCI 1,470 - 1,501

Text Arena · Multi Turn

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #28 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,470
Percentile: 91.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: multi_turn. Source rank: #37. Votes: 534. Organization: zai. License: MIT.

91.6% percentile inside its fair comparison set

1,470Raw benchmark valueCI 1,444 - 1,496

Text Arena · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #13 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: overall. Source rank: #16. Votes: 3357. Organization: zai. License: MIT.

96.3% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,456 - 1,477

Text Arena · Creative Writing · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,456
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: creative_writing. Source rank: #14. Votes: 623. Organization: zai. License: MIT.

96.6% percentile inside its fair comparison set

1,456Raw benchmark valueCI 1,432 - 1,480

Text Arena · English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,473
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: english. Source rank: #14. Votes: 1602. Organization: zai. License: MIT.

96.6% percentile inside its fair comparison set

1,473Raw benchmark valueCI 1,459 - 1,488

Text Arena · Exclude Ties · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #12 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,471
Percentile: 96.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: exclude_ties. Source rank: #15. Votes: 2479. Organization: zai. License: MIT.

96.6% percentile inside its fair comparison set

1,471Raw benchmark valueCI 1,457 - 1,485

Text Arena · Hard Prompts · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,480
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts. Source rank: #18. Votes: 2148. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,480Raw benchmark valueCI 1,467 - 1,493

Text Arena · Hard Prompts English · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,482
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: hard_prompts_english. Source rank: #11. Votes: 1074. Organization: zai. License: MIT.

97.5% percentile inside its fair comparison set

1,482Raw benchmark valueCI 1,464 - 1,500

Text Arena · Instruction Following · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,460
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: instruction_following. Source rank: #20. Votes: 1122. Organization: zai. License: MIT.

95.7% percentile inside its fair comparison set

1,460Raw benchmark valueCI 1,442 - 1,478

Text Arena · Longer Query · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #16 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,475
Percentile: 95.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: longer_query. Source rank: #20. Votes: 1468. Organization: zai. License: MIT.

95.1% percentile inside its fair comparison set

1,475Raw benchmark valueCI 1,459 - 1,491

Text Arena · Multi Turn · No Style Control

AR · Chat / text · Human

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #17 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 95%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: multi_turn. Source rank: #21. Votes: 534. Organization: zai. License: MIT.

95% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,441 - 1,493

Instruction following

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #29 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62.3%
Percentile: 74.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: IF. Tasks scored: 4.

74.1% percentile inside its fair comparison set

62.3%Raw benchmark value

Language

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #43 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76.2%
Percentile: 61.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Language. Tasks scored: 3.

61.1% percentile inside its fair comparison set

76.2%Raw benchmark value

Paraphrase

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #22 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62.3%
Percentile: 80.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: paraphrase. Category: IF.

80.6% percentile inside its fair comparison set

62.3%Raw benchmark value

Simplify

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #35 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 55.7%
Percentile: 68.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: simplify. Category: IF.

68.5% percentile inside its fair comparison set

55.7%Raw benchmark value

Story generation

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #40 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 62.9%
Percentile: 63.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: story_generation. Category: IF.

63.9% percentile inside its fair comparison set

62.9%Raw benchmark value

Summarize

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #15 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 68.4%
Percentile: 87%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: summarize. Category: IF.

87% percentile inside its fair comparison set

68.4%Raw benchmark value

Connections

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #40 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 94%
Percentile: 64.8%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: connections. Category: Language.

64.8% percentile inside its fair comparison set

94%Raw benchmark value

Plot unscrambling

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #27 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 60.7%
Percentile: 75.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: plot_unscrambling. Category: Language.

75.9% percentile inside its fair comparison set

60.7%Raw benchmark value

Typos

LB · Chat / text · Objective

It tests whether the model is actually useful in normal conversational turns, not just on narrow correctness tasks.

Rank #63 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 74%
Percentile: 45.8%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typos. Category: Language.

45.8% percentile inside its fair comparison set

74%Raw benchmark value

Coding23 benchmarks91.5%

Terminal-Bench Hard

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #5 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 50.8%
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `terminalbenchHard`.

98.7% percentile inside its fair comparison set

50.8%Raw benchmark value

SciCode

AA · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #5 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 50.5%
Percentile: 98.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `scicode`.

98.9% percentile inside its fair comparison set

50.5%Raw benchmark value

Coding Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #7 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 69
Percentile: 92%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `codingIndex`.

92% percentile inside its fair comparison set

69Raw benchmark value

Agentic Index

AA · Coding · Combined

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 43
Percentile: 93.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `agenticIndex`.

93.5% percentile inside its fair comparison set

43Raw benchmark value

Code Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #2 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,593
Percentile: 98.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: overall. Source rank: #2. Votes: 1994. Organization: zai. License: MIT.

98.6% percentile inside its fair comparison set

1,593Raw benchmark valueCI 1,579 - 1,608

WebDev Arena

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #2 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,593
Percentile: 98.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: webdev. Source rank: #2. Votes: 1994. Organization: zai. License: MIT.

98.6% percentile inside its fair comparison set

1,593Raw benchmark valueCI 1,579 - 1,608

Code Arena · Webdev Html

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #5 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,541
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: webdev-html. Source rank: #7. Votes: 282. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,541Raw benchmark valueCI 1,504 - 1,578

Code Arena · Webdev React

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #2 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,597
Percentile: 98.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: webdev-react. Source rank: #2. Votes: 1712. Organization: zai. License: MIT.

98.3% percentile inside its fair comparison set

1,597Raw benchmark valueCI 1,581 - 1,613

Code Migration

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #6 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 37.9%
Percentile: 77.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: code-migration; provider: Zhipu AI.

77.3% percentile inside its fair comparison set

37.9%Raw benchmark valueCI 29.7% - 46%

LiveCodeBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #61 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 69.5%
Percentile: 33.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: lcb; provider: Zhipu AI.

33.3% percentile inside its fair comparison set

69.5%Raw benchmark valueCI 67.2% - 71.8%

ProgramBench

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 0.5%
Percentile: 90%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: programbench; provider: Zhipu AI.

90% percentile inside its fair comparison set

0.5%Raw benchmark valueCI 0% - 1.5%

SWE-bench Verified

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 82.8%
Percentile: 96.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: swebench; provider: Zhipu AI.

96.3% percentile inside its fair comparison set

82.8%Raw benchmark valueCI 79.5% - 86.1%

Terminal-Bench 2.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #7 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 67.8%
Percentile: 77.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: terminal-bench-2-1; provider: Zhipu AI.

77.8% percentile inside its fair comparison set

67.8%Raw benchmark valueCI 65.8% - 69.7%

Vibe Code Bench v1.1

VALS-AI · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #6 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 64%
Percentile: 89.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vibe-code; provider: Zhipu AI.

89.8% percentile inside its fair comparison set

64%Raw benchmark valueCI 54.6% - 73.4%

Text Arena · Coding

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,526
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: coding. Source rank: #11. Votes: 913. Organization: zai. License: MIT.

97.5% percentile inside its fair comparison set

1,526Raw benchmark valueCI 1,506 - 1,546

Text Arena · Coding · No Style Control

AR · Coding · Human

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #8 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,497
Percentile: 97.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: coding. Source rank: #12. Votes: 913. Organization: zai. License: MIT.

97.8% percentile inside its fair comparison set

1,497Raw benchmark valueCI 1,478 - 1,517

Agentic coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #1 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 73.3%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Agentic Coding. Tasks scored: 3.

100% percentile inside its fair comparison set

73.3%Raw benchmark value

Coding

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #11 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 79.7%
Percentile: 91.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Coding. Tasks scored: 2.

91.7% percentile inside its fair comparison set

79.7%Raw benchmark value

JavaScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 65%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: javascript. Category: Agentic Coding.

100% percentile inside its fair comparison set

65%Raw benchmark value

TypeScript

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #4 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 65%
Percentile: 100%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: typescript. Category: Agentic Coding.

100% percentile inside its fair comparison set

65%Raw benchmark value

Python

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #3 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 90%
Percentile: 99.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: python. Category: Agentic Coding.

99.1% percentile inside its fair comparison set

90%Raw benchmark value

Coding generation

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #35 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 78.9%
Percentile: 83.3%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_generation. Category: Coding.

83.3% percentile inside its fair comparison set

78.9%Raw benchmark value

Coding completion

LB · Coding · Objective

It tells you whether the model can generate, repair, and reason over code under evaluator pressure rather than marketing examples.

Rank #8 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 80.4%
Percentile: 97.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: code_completion. Category: Coding.

97.2% percentile inside its fair comparison set

80.4%Raw benchmark value

Reasoning / math / science16 benchmarks83.3%

Humanity's Last Exam

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #4 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 40.1%
Percentile: 99.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `hle`.

99.2% percentile inside its fair comparison set

40.1%Raw benchmark value

GPQA

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #9 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 89.5%
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `gpqa`.

98.1% percentile inside its fair comparison set

89.5%Raw benchmark value

CritPt

AA · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #6 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 20.9%
Percentile: 98.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `critpt`.

98.7% percentile inside its fair comparison set

20.9%Raw benchmark value

ProofBench

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #9 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 35%
Percentile: 77.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: proof_bench; provider: Zhipu AI.

77.1% percentile inside its fair comparison set

35%Raw benchmark valueCI 25.6% - 44.4%

GPQA Diamond

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #27 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 85.6%
Percentile: 71.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: gpqa; provider: Zhipu AI.

71.9% percentile inside its fair comparison set

85.6%Raw benchmark valueCI 82.1% - 89.1%

MMLU Pro

VALS-AI · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #23 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 86.7%
Percentile: 75.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: mmlu_pro; provider: Zhipu AI.

75.3% percentile inside its fair comparison set

86.7%Raw benchmark valueCI 86.1% - 87.4%

Mathematics

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #13 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 89.8%
Percentile: 88.9%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Mathematics. Tasks scored: 4.

88.9% percentile inside its fair comparison set

89.8%Raw benchmark value

Reasoning

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #30 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 78.6%
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Reasoning. Tasks scored: 4.

73.1% percentile inside its fair comparison set

78.6%Raw benchmark value

AMPS Hard

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #30 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 98%
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: AMPS_Hard. Category: Mathematics.

94.4% percentile inside its fair comparison set

98%Raw benchmark value

Integrals with game

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #17 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76%
Percentile: 85.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: integrals_with_game. Category: Mathematics.

85.2% percentile inside its fair comparison set

76%Raw benchmark value

Math competition

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #11 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 96.1%
Percentile: 95.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: math_comp. Category: Mathematics.

95.4% percentile inside its fair comparison set

96.1%Raw benchmark value

Olympiad

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #21 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 89%
Percentile: 81.5%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: olympiad. Category: Mathematics.

81.5% percentile inside its fair comparison set

89%Raw benchmark value

Theory of mind

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #33 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 75%
Percentile: 73.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: theory_of_mind. Category: Reasoning.

73.1% percentile inside its fair comparison set

75%Raw benchmark value

Zebra puzzle

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #41 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 65.5%
Percentile: 62.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: zebra_puzzle. Category: Reasoning.

62.6% percentile inside its fair comparison set

65.5%Raw benchmark value

Spatial

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #54 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 94%
Percentile: 60.2%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: spatial. Category: Reasoning.

60.2% percentile inside its fair comparison set

94%Raw benchmark value

Logic with navigation

LB · Reasoning / math / science · Objective

It is one of the cleaner reads on deliberate reasoning strength rather than style or popularity.

Rank #5 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 80%
Percentile: 98.1%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: logic_with_navigation. Category: Reasoning.

98.1% percentile inside its fair comparison set

80%Raw benchmark value

Professional reasoning33 benchmarks83.8%

GDPval-AA

AA · Professional reasoning · Rubric

Agentic performance on economically valuable work tasks.

Rank #3 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 1,521
Percentile: 95.7%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.

95.7% percentile inside its fair comparison set

1,521Raw benchmark value

Legal Research Bench

VALS-AI · Professional reasoning · Objective

Applied legal research tasks.

Rank #4 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 31.3%
Percentile: 75%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: Zhipu AI.

75% percentile inside its fair comparison set

31.3%Raw benchmark valueCI 24.9% - 37.6%

SkillsBench

VALS-AI · Professional reasoning · Objective

Applied professional skills tasks.

Rank #10 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 45.1%
Percentile: 10%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: Zhipu AI.

10% percentile inside its fair comparison set

45.1%Raw benchmark valueCI 36.5% - 53.7%

Vals Index

VALS-AI · Professional reasoning · Combined

Weighted model performance across economically relevant Vals tasks.

Rank #5 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 65
Percentile: 84.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Zhipu AI.

84.6% percentile inside its fair comparison set

65Raw benchmark valueCI 62 - 68

Harvey's Legal Agent Benchmark

VALS-AI · Professional reasoning · Objective

Completing legal work with documents, spreadsheets, presentations, and file-system tools.

Rank #3 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 7.1%
Percentile: 84.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: Zhipu AI.

84.6% percentile inside its fair comparison set

7.1%Raw benchmark valueCI 3.2% - 11%

LegalBench

VALS-AI · Professional reasoning · Objective

Academic legal reasoning tasks.

Rank #23 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 84.1%
Percentile: 75.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Zhipu AI.

75.6% percentile inside its fair comparison set

84.1%Raw benchmark valueCI 83.2% - 85%

Finance Agent v2

VALS-AI · Professional reasoning · Objective

Core financial analyst tasks for agentic models.

Rank #7 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 49.7%
Percentile: 76%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Zhipu AI.

76% percentile inside its fair comparison set

49.7%Raw benchmark valueCI 48% - 51.4%

TaxEval v2

VALS-AI · Professional reasoning · Objective

Answer quality on tax questions and responses.

Rank #32 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 73.3%
Percentile: 65.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Zhipu AI.

65.9% percentile inside its fair comparison set

73.3%Raw benchmark valueCI 71.6% - 75%

MedCode

VALS-AI · Professional reasoning · Objective

Medical billing support and coding tasks.

Rank #26 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 40.8%
Percentile: 51%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.

51% percentile inside its fair comparison set

40.8%Raw benchmark valueCI 36.5% - 45%

MedScribe

VALS-AI · Professional reasoning · Objective

Administrative documentation support for doctors.

Rank #14 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 83.5%
Percentile: 74%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Zhipu AI.

74% percentile inside its fair comparison set

83.5%Raw benchmark valueCI 79.6% - 87.5%

Text Arena · Expert

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #23 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,489
Percentile: 92%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: expert. Source rank: #29. Votes: 340. Organization: zai. License: MIT.

92% percentile inside its fair comparison set

1,489Raw benchmark valueCI 1,458 - 1,520

Text Arena · Industry Business And Management And Financial Operations

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #20 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,461
Percentile: 94%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_business_and_management_and_financial_operations. Source rank: #26. Votes: 621. Organization: zai. License: MIT.

94% percentile inside its fair comparison set

1,461Raw benchmark valueCI 1,438 - 1,485

Text Arena · Industry Entertainment And Sports And Media

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #21 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,440
Percentile: 93.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_entertainment_and_sports_and_media. Source rank: #29. Votes: 804. Organization: zai. License: MIT.

93.8% percentile inside its fair comparison set

1,440Raw benchmark valueCI 1,419 - 1,461

Text Arena · Industry Legal And Government

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #45 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,452
Percentile: 85.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_legal_and_government. Source rank: #59. Votes: 260. Organization: zai. License: MIT.

85.2% percentile inside its fair comparison set

1,452Raw benchmark valueCI 1,415 - 1,488

Text Arena · Industry Life And Physical And Social Science

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #4 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,510
Percentile: 99.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_life_and_physical_and_social_science. Source rank: #5. Votes: 555. Organization: zai. License: MIT.

99.1% percentile inside its fair comparison set

1,510Raw benchmark valueCI 1,484 - 1,536

Text Arena · Industry Mathematical

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #17 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,482
Percentile: 94.8%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_mathematical. Source rank: #20. Votes: 212. Organization: zai. License: MIT.

94.8% percentile inside its fair comparison set

1,482Raw benchmark valueCI 1,441 - 1,522

Text Arena · Industry Medicine And Healthcare

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,500
Percentile: 97.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_medicine_and_healthcare. Source rank: #11. Votes: 246. Organization: zai. License: MIT.

97.3% percentile inside its fair comparison set

1,500Raw benchmark valueCI 1,460 - 1,539

Text Arena · Industry Software And It Services

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #11 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,512
Percentile: 96.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_software_and_it_services. Source rank: #14. Votes: 1299. Organization: zai. License: MIT.

96.9% percentile inside its fair comparison set

1,512Raw benchmark valueCI 1,495 - 1,528

Text Arena · Industry Writing And Literature And Language

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,467
Percentile: 97.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_writing_and_literature_and_language. Source rank: #12. Votes: 827. Organization: zai. License: MIT.

97.5% percentile inside its fair comparison set

1,467Raw benchmark valueCI 1,446 - 1,488

Text Arena · Expert · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena expert leaderboard.

Rank #22 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,476
Percentile: 92.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: expert. Source rank: #28. Votes: 340. Organization: zai. License: MIT.

92.4% percentile inside its fair comparison set

1,476Raw benchmark valueCI 1,445 - 1,507

Text Arena · Industry Business And Management And Financial Operations · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.

Rank #19 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,448
Percentile: 94.3%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_business_and_management_and_financial_operations. Source rank: #24. Votes: 621. Organization: zai. License: MIT.

94.3% percentile inside its fair comparison set

1,448Raw benchmark valueCI 1,425 - 1,472

Text Arena · Industry Entertainment And Sports And Media · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.

Rank #11 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,445
Percentile: 96.9%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_entertainment_and_sports_and_media. Source rank: #14. Votes: 804. Organization: zai. License: MIT.

96.9% percentile inside its fair comparison set

1,445Raw benchmark valueCI 1,424 - 1,466

Text Arena · Industry Legal And Government · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.

Rank #35 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,447
Percentile: 88.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_legal_and_government. Source rank: #44. Votes: 260. Organization: zai. License: MIT.

88.6% percentile inside its fair comparison set

1,447Raw benchmark valueCI 1,412 - 1,483

Text Arena · Industry Life And Physical And Social Science · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.

Rank #3 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,502
Percentile: 99.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_life_and_physical_and_social_science. Source rank: #4. Votes: 555. Organization: zai. License: MIT.

99.4% percentile inside its fair comparison set

1,502Raw benchmark valueCI 1,476 - 1,528

Text Arena · Industry Mathematical · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_mathematical leaderboard.

Rank #18 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,474
Percentile: 94.5%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_mathematical. Source rank: #21. Votes: 212. Organization: zai. License: MIT.

94.5% percentile inside its fair comparison set

1,474Raw benchmark valueCI 1,433 - 1,514

Text Arena · Industry Medicine And Healthcare · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.

Rank #7 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,487
Percentile: 98%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_medicine_and_healthcare. Source rank: #9. Votes: 246. Organization: zai. License: MIT.

98% percentile inside its fair comparison set

1,487Raw benchmark valueCI 1,448 - 1,526

Text Arena · Industry Software And It Services · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.

Rank #7 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,491
Percentile: 98.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_software_and_it_services. Source rank: #10. Votes: 1299. Organization: zai. License: MIT.

98.2% percentile inside its fair comparison set

1,491Raw benchmark valueCI 1,475 - 1,508

Text Arena · Industry Writing And Literature And Language · No Style Control

AR · Professional reasoning · Human

Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.

Rank #7 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,468
Percentile: 98.1%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_writing_and_literature_and_language. Source rank: #9. Votes: 827. Organization: zai. License: MIT.

98.1% percentile inside its fair comparison set

1,468Raw benchmark valueCI 1,447 - 1,489

Data analysis

LB · Professional reasoning · Objective

Structured data manipulation and table reasoning accuracy.

Rank #24 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 73.7%
Percentile: 78.7%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.

78.7% percentile inside its fair comparison set

73.7%Raw benchmark value

Overall

LB · Professional reasoning · Objective

Average objective performance across LiveBench's current public category mix.

Rank #7 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 76.2%
Percentile: 94.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Category averages included: 7.

94.4% percentile inside its fair comparison set

76.2%Raw benchmark value

Consecutive events

LB · Professional reasoning · Objective

Objective consecutive events score in LiveBench.

Rank #20 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 79.3%
Percentile: 82.4%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.

82.4% percentile inside its fair comparison set

79.3%Raw benchmark value

Table join

LB · Professional reasoning · Objective

Objective table join score in LiveBench.

Rank #28 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 45.9%
Percentile: 75%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.

75% percentile inside its fair comparison set

45.9%Raw benchmark value

Table reformat

LB · Professional reasoning · Objective

Objective table reformat score in LiveBench.

Rank #92 · Source label: glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: LiveBench
Raw value: 96.1%
Percentile: 30.6%
Last updated: archived
Eligibility: benchmark_derived_model

Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.

30.6% percentile inside its fair comparison set

96.1%Raw benchmark value

Search / tool use1 benchmark100%

Tau2-Bench Telecom

AA · Search / tool use · Objective

It matters when the model must browse, call tools, and recover useful answers from external systems.

Rank #2 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 99.1%
Percentile: 100%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `tau2`.

100% percentile inside its fair comparison set

99.1%Raw benchmark value

Long context2 benchmarks92.9%

Long Context Reasoning

AA · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #3 · Source label: GLM-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Artificial Analysis
Raw value: 71.3%
Percentile: 99.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Artificial Analysis public leaderboard field `lcr`.

99.4% percentile inside its fair comparison set

71.3%Raw benchmark value

CorpFin v2

VALS-AI · Long context · Objective

It checks whether long-context claims survive contact with retrieval, memory, or long-document tasks.

Rank #13 · Source label: zai/glm-5.2

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Vals AI
Raw value: 66.1%
Percentile: 86.4%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Vals AI BenchmarkView overall scores. Vals slug: corp_fin_v2; provider: Zhipu AI.

86.4% percentile inside its fair comparison set

66.1%Raw benchmark valueCI 64.3% - 67.9%

Multilingual2 benchmarks97.4%

Text Arena · Russian

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #9 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,487
Percentile: 97.2%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: russian. Source rank: #11. Votes: 338. Organization: zai. License: MIT.

97.2% percentile inside its fair comparison set

1,487Raw benchmark valueCI 1,455 - 1,518

Text Arena · Russian · No Style Control

AR · Multilingual · Human

Observed user preference in Arena's Text Arena russian leaderboard.

Rank #8 · Source label: glm-5.2 (max)

verified runtimeexact aliasBackground only

Raw row drilldownsource row, percentile, last updated, eligibility

Source: Arena
Raw value: 1,484
Percentile: 97.6%
Last updated: recent
Eligibility: benchmark_derived_model

Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: russian. Source rank: #11. Votes: 338. Organization: zai. License: MIT.

97.6% percentile inside its fair comparison set

1,484Raw benchmark valueCI 1,452 - 1,515

Source links and registry checks

official

Artificial Analysis

Jun 20, 2026

source →

official

LiveBench

Jun 20, 2026

source →

GLM-5.2 (max)

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks

Loading model evidence.

GLM-5.2 (max)

Current snapshot.

Source-linked scores by benchmark

Source links and registry checks