GDPval-AA
AA · Professional reasoning · Rubric
Agentic performance on economically valuable work tasks.
Rank #20 · Source label: Qwen3.6 Plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Artificial Analysis
- Raw value
- 1,156
- Percentile
- 58.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Artificial Analysis public leaderboard field `gdpvalBreakdown.elo`.
58.7% percentile inside its fair comparison set1,156Raw benchmark value
Vals Index
VALS-AI · Professional reasoning · Combined
Weighted model performance across economically relevant Vals tasks.
Rank #19 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 49
- Percentile
- 30.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: vals_index; provider: Alibaba.
30.8% percentile inside its fair comparison set49Raw benchmark valueCI 46 - 52
LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #20 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 84.2%
- Percentile
- 78.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Alibaba.
78.9% percentile inside its fair comparison set84.2%Raw benchmark valueCI 83.4% - 85.1%
Finance Agent v2
VALS-AI · Professional reasoning · Objective
Core financial analyst tasks for agentic models.
Rank #16 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 40.8%
- Percentile
- 40%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: fabv2; provider: Alibaba.
40% percentile inside its fair comparison set40.8%Raw benchmark valueCI 40.6% - 41.1%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #17 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 74.7%
- Percentile
- 82.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Alibaba.
82.4% percentile inside its fair comparison set74.7%Raw benchmark valueCI 73.1% - 76.4%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #36 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 36.9%
- Percentile
- 31.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.
31.4% percentile inside its fair comparison set36.9%Raw benchmark valueCI 32.9% - 40.8%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #27 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 77%
- Percentile
- 48%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Alibaba.
48% percentile inside its fair comparison set77%Raw benchmark valueCI 73.2% - 80.7%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #40 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,475
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: expert. Source rank: #49. Votes: 2674. Organization: alibaba. License: Proprietary.
85.8% percentile inside its fair comparison set1,475Raw benchmark valueCI 1,463 - 1,487
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #36 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 89%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_business_and_management_and_financial_operations. Source rank: #47. Votes: 5967. Organization: alibaba. License: Proprietary.
89% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,441 - 1,458
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #51 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 84.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_entertainment_and_sports_and_media. Source rank: #64. Votes: 5855. Organization: alibaba. License: Proprietary.
84.5% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,397 - 1,415
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #48 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,450
- Percentile
- 84.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_legal_and_government. Source rank: #62. Votes: 2104. Organization: alibaba. License: Proprietary.
84.2% percentile inside its fair comparison set1,450Raw benchmark valueCI 1,436 - 1,463
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #60 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,449
- Percentile
- 81.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_life_and_physical_and_social_science. Source rank: #75. Votes: 4624. Organization: alibaba. License: Proprietary.
81.7% percentile inside its fair comparison set1,449Raw benchmark valueCI 1,440 - 1,458
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #38 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,455
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_mathematical. Source rank: #46. Votes: 1706. Organization: alibaba. License: Proprietary.
88% percentile inside its fair comparison set1,455Raw benchmark valueCI 1,440 - 1,470
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #50 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 83.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_medicine_and_healthcare. Source rank: #64. Votes: 2004. Organization: alibaba. License: Proprietary.
83.4% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,446 - 1,475
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #43 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,481
- Percentile
- 87.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_software_and_it_services. Source rank: #55. Votes: 12093. Organization: alibaba. License: Proprietary.
87.1% percentile inside its fair comparison set1,481Raw benchmark valueCI 1,474 - 1,487
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #43 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,424
- Percentile
- 87%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_writing_and_literature_and_language. Source rank: #56. Votes: 6570. Organization: alibaba. License: Proprietary.
87% percentile inside its fair comparison set1,424Raw benchmark valueCI 1,416 - 1,432
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #34 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,454
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: expert. Source rank: #42. Votes: 2674. Organization: alibaba. License: Proprietary.
88% percentile inside its fair comparison set1,454Raw benchmark valueCI 1,443 - 1,466
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #29 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 91.2%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_business_and_management_and_financial_operations. Source rank: #34. Votes: 5967. Organization: alibaba. License: Proprietary.
91.2% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,428 - 1,444
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #49 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,402
- Percentile
- 85.1%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_entertainment_and_sports_and_media. Source rank: #60. Votes: 5855. Organization: alibaba. License: Proprietary.
85.1% percentile inside its fair comparison set1,402Raw benchmark valueCI 1,393 - 1,410
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #46 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,441
- Percentile
- 84.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_legal_and_government. Source rank: #55. Votes: 2104. Organization: alibaba. License: Proprietary.
84.9% percentile inside its fair comparison set1,441Raw benchmark valueCI 1,427 - 1,454
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #61 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,436
- Percentile
- 81.4%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_life_and_physical_and_social_science. Source rank: #74. Votes: 4624. Organization: alibaba. License: Proprietary.
81.4% percentile inside its fair comparison set1,436Raw benchmark valueCI 1,427 - 1,445
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #38 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_mathematical. Source rank: #45. Votes: 1706. Organization: alibaba. License: Proprietary.
88% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,432 - 1,462
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #58 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,445
- Percentile
- 80.7%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_medicine_and_healthcare. Source rank: #66. Votes: 2004. Organization: alibaba. License: Proprietary.
80.7% percentile inside its fair comparison set1,445Raw benchmark valueCI 1,431 - 1,459
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #35 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,460
- Percentile
- 89.5%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_software_and_it_services. Source rank: #44. Votes: 12093. Organization: alibaba. License: Proprietary.
89.5% percentile inside its fair comparison set1,460Raw benchmark valueCI 1,454 - 1,467
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #37 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 88.9%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Arena leaderboard dataset row `qwen3.6-plus`. Category: industry_writing_and_literature_and_language. Source rank: #47. Votes: 6570. Organization: alibaba. License: Proprietary.
88.9% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,412 - 1,428
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #20 · Source label: alibaba/qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 44.9%
- Percentile
- 57.8%
- Last updated
- recent
- Eligibility
- benchmark_derived_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Alibaba.
57.8% percentile inside its fair comparison set44.9%Raw benchmark valueCI 38.1% - 51.6%
Data analysis
LB · Professional reasoning · Objective
Structured data manipulation and table reasoning accuracy.
Rank #31 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 69.9%
- Percentile
- 72.2%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Category: Data Analysis. Tasks scored: 3.
72.2% percentile inside its fair comparison set69.9%Raw benchmark value
Overall
LB · Professional reasoning · Objective
Average objective performance across LiveBench's current public category mix.
Rank #32 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 70.9%
- Percentile
- 71.3%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Category averages included: 7.
71.3% percentile inside its fair comparison set70.9%Raw benchmark value
Consecutive events
LB · Professional reasoning · Objective
Objective consecutive events score in LiveBench.
Rank #29 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 67%
- Percentile
- 74.1%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: consecutive_events. Category: Data Analysis.
74.1% percentile inside its fair comparison set67%Raw benchmark value
Table join
LB · Professional reasoning · Objective
Objective table join score in LiveBench.
Rank #44 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 44.7%
- Percentile
- 60.2%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: tablejoin. Category: Data Analysis.
60.2% percentile inside its fair comparison set44.7%Raw benchmark value
Table reformat
LB · Professional reasoning · Objective
Objective table reformat score in LiveBench.
Rank #71 · Source label: qwen3.6-plus
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- LiveBench
- Raw value
- 98%
- Percentile
- 74.1%
- Last updated
- archived
- Eligibility
- benchmark_derived_model
Derived from the official LiveBench website leaderboard table. Task: tablereformat. Category: Data Analysis.
74.1% percentile inside its fair comparison set98%Raw benchmark value