GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #3 · Source label: GLM-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,521
- Percentile
- 95.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
95.7% percentile inside its fair comparison set1,521Raw benchmark value
Legal Research Bench
VALS-AI · Professional reasoning · Objective
Applied legal research tasks.
Rank #4 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 31.3%
- Percentile
- 75%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_research; provider: Zhipu AI.
75% percentile inside its fair comparison set31.3%Raw benchmark valueCI 24.9% - 37.6%
SkillsBench
VALS-AI · Professional reasoning · Objective
Applied professional skills tasks.
Rank #10 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 45.1%
- Percentile
- 10%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: skillsbench; provider: Zhipu AI.
10% percentile inside its fair comparison set45.1%Raw benchmark valueCI 36.5% - 53.7%
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #5 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 65
- Percentile
- 84.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Zhipu AI.
84.6% percentile inside its fair comparison set65Raw benchmark valueCI 62 - 68
Harvey's Legal Agent Benchmark
VALS-AI · Professional reasoning · Objective
Completing legal work with documents, spreadsheets, presentations, and file-system tools.
Rank #3 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 7.1%
- Percentile
- 84.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: hlab; provider: Zhipu AI.
84.6% percentile inside its fair comparison set7.1%Raw benchmark valueCI 3.2% - 11%
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #23 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 84.1%
- Percentile
- 75.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Zhipu AI.
75.6% percentile inside its fair comparison set84.1%Raw benchmark valueCI 83.2% - 85%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #7 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 49.7%
- Percentile
- 76%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Zhipu AI.
76% percentile inside its fair comparison set49.7%Raw benchmark valueCI 48% - 51.4%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #32 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 73.3%
- Percentile
- 65.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Zhipu AI.
65.9% percentile inside its fair comparison set73.3%Raw benchmark valueCI 71.6% - 75%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #26 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 40.8%
- Percentile
- 51%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Zhipu AI.
51% percentile inside its fair comparison set40.8%Raw benchmark valueCI 36.5% - 45%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #14 · Source label: zai/glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 83.5%
- Percentile
- 74%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Zhipu AI.
74% percentile inside its fair comparison set83.5%Raw benchmark valueCI 79.6% - 87.5%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #23 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,489
- Percentile
- 92%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: expert. Source rank: #29. Votes: 340. Organization: zai. License: MIT.
92% percentile inside its fair comparison set1,489Raw benchmark valueCI 1,458 - 1,520
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #20 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 94%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_business_and_management_and_financial_operations. Source rank: #26. Votes: 621. Organization: zai. License: MIT.
94% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,438 - 1,485
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #21 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,440
- Percentile
- 93.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_entertainment_and_sports_and_media. Source rank: #29. Votes: 804. Organization: zai. License: MIT.
93.8% percentile inside its fair comparison set1,440Raw benchmark valueCI 1,419 - 1,461
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #45 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,452
- Percentile
- 85.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_legal_and_government. Source rank: #59. Votes: 260. Organization: zai. License: MIT.
85.2% percentile inside its fair comparison set1,452Raw benchmark valueCI 1,415 - 1,488
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #4 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,510
- Percentile
- 99.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_life_and_physical_and_social_science. Source rank: #5. Votes: 555. Organization: zai. License: MIT.
99.1% percentile inside its fair comparison set1,510Raw benchmark valueCI 1,484 - 1,536
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #17 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,482
- Percentile
- 94.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_mathematical. Source rank: #20. Votes: 212. Organization: zai. License: MIT.
94.8% percentile inside its fair comparison set1,482Raw benchmark valueCI 1,441 - 1,522
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #9 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,500
- Percentile
- 97.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_medicine_and_healthcare. Source rank: #11. Votes: 246. Organization: zai. License: MIT.
97.3% percentile inside its fair comparison set1,500Raw benchmark valueCI 1,460 - 1,539
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #11 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,512
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_software_and_it_services. Source rank: #14. Votes: 1299. Organization: zai. License: MIT.
96.9% percentile inside its fair comparison set1,512Raw benchmark valueCI 1,495 - 1,528
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #9 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,467
- Percentile
- 97.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_writing_and_literature_and_language. Source rank: #12. Votes: 827. Organization: zai. License: MIT.
97.5% percentile inside its fair comparison set1,467Raw benchmark valueCI 1,446 - 1,488
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #22 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,476
- Percentile
- 92.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: expert. Source rank: #28. Votes: 340. Organization: zai. License: MIT.
92.4% percentile inside its fair comparison set1,476Raw benchmark valueCI 1,445 - 1,507
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #19 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,448
- Percentile
- 94.3%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_business_and_management_and_financial_operations. Source rank: #24. Votes: 621. Organization: zai. License: MIT.
94.3% percentile inside its fair comparison set1,448Raw benchmark valueCI 1,425 - 1,472
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #11 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 96.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_entertainment_and_sports_and_media. Source rank: #14. Votes: 804. Organization: zai. License: MIT.
96.9% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,424 - 1,466
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #35 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 88.6%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_legal_and_government. Source rank: #44. Votes: 260. Organization: zai. License: MIT.
88.6% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,412 - 1,483
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #3 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,502
- Percentile
- 99.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_life_and_physical_and_social_science. Source rank: #4. Votes: 555. Organization: zai. License: MIT.
99.4% percentile inside its fair comparison set1,502Raw benchmark valueCI 1,476 - 1,528
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #18 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,474
- Percentile
- 94.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_mathematical. Source rank: #21. Votes: 212. Organization: zai. License: MIT.
94.5% percentile inside its fair comparison set1,474Raw benchmark valueCI 1,433 - 1,514
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #7 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,487
- Percentile
- 98%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_medicine_and_healthcare. Source rank: #9. Votes: 246. Organization: zai. License: MIT.
98% percentile inside its fair comparison set1,487Raw benchmark valueCI 1,448 - 1,526
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #7 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,491
- Percentile
- 98.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_software_and_it_services. Source rank: #10. Votes: 1299. Organization: zai. License: MIT.
98.2% percentile inside its fair comparison set1,491Raw benchmark valueCI 1,475 - 1,508
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #7 · Source label: glm-5.2 (max)
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,468
- Percentile
- 98.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `glm-5.2 (max)`. Category: industry_writing_and_literature_and_language. Source rank: #9. Votes: 827. Organization: zai. License: MIT.
98.1% percentile inside its fair comparison set1,468Raw benchmark valueCI 1,447 - 1,489
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #24 · Source label: glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 73.7%
- Percentile
- 78.7%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
78.7% percentile inside its fair comparison set73.7%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #7 · Source label: glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 76.2%
- Percentile
- 94.4%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
94.4% percentile inside its fair comparison set76.2%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #20 · Source label: glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 79.3%
- Percentile
- 82.4%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
82.4% percentile inside its fair comparison set79.3%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #28 · Source label: glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 45.9%
- Percentile
- 75%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
75% percentile inside its fair comparison set45.9%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #92 · Source label: glm-5.2
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 96.1%
- Percentile
- 30.6%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
30.6% percentile inside its fair comparison set96.1%Raw benchmark value