LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #46 · Source label: alibaba/qwen3-max
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 81.9%
- Percentile
- 50%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Alibaba.
50% percentile inside its fair comparison set81.9%Raw benchmark valueCI 81.1% - 82.6%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #26 · Source label: alibaba/qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 74%
- Percentile
- 72.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Alibaba.
72.5% percentile inside its fair comparison set74%Raw benchmark valueCI 72.3% - 75.6%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #50 · Source label: alibaba/qwen3-max-2026-01-23
verified runtimeexact directBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 31.4%
- Percentile
- 3.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.
3.9% percentile inside its fair comparison set31.4%Raw benchmark valueCI 27.7% - 35.1%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #38 · Source label: alibaba/qwen3-max-2026-01-23
verified runtimeexact directBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 72.7%
- Percentile
- 26%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Alibaba.
26% percentile inside its fair comparison set72.7%Raw benchmark valueCI 69% - 76.4%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #44 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,469
- Percentile
- 84.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: expert. Source rank: #54. Votes: 1263. Organization: alibaba. License: Proprietary.
84.4% percentile inside its fair comparison set1,469Raw benchmark valueCI 1,453 - 1,486
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #40 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,446
- Percentile
- 87.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_business_and_management_and_financial_operations. Source rank: #52. Votes: 5147. Organization: alibaba. License: Proprietary.
87.7% percentile inside its fair comparison set1,446Raw benchmark valueCI 1,437 - 1,454
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #60 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,398
- Percentile
- 81.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_entertainment_and_sports_and_media. Source rank: #75. Votes: 4951. Organization: alibaba. License: Proprietary.
81.7% percentile inside its fair comparison set1,398Raw benchmark valueCI 1,389 - 1,406
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #67 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,431
- Percentile
- 77.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_legal_and_government. Source rank: #86. Votes: 1738. Organization: alibaba. License: Proprietary.
77.9% percentile inside its fair comparison set1,431Raw benchmark valueCI 1,417 - 1,445
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #47 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 85.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_life_and_physical_and_social_science. Source rank: #59. Votes: 4207. Organization: alibaba. License: Proprietary.
85.8% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,448 - 1,466
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #54 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,441
- Percentile
- 82.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_mathematical. Source rank: #67. Votes: 1313. Organization: alibaba. License: Proprietary.
82.8% percentile inside its fair comparison set1,441Raw benchmark valueCI 1,426 - 1,457
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #36 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,474
- Percentile
- 88.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_medicine_and_healthcare. Source rank: #44. Votes: 1458. Organization: alibaba. License: Proprietary.
88.1% percentile inside its fair comparison set1,474Raw benchmark valueCI 1,458 - 1,490
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #54 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,470
- Percentile
- 83.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_software_and_it_services. Source rank: #68. Votes: 9487. Organization: alibaba. License: Proprietary.
83.7% percentile inside its fair comparison set1,470Raw benchmark valueCI 1,464 - 1,477
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #61 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 81.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_writing_and_literature_and_language. Source rank: #78. Votes: 6191. Organization: alibaba. License: Proprietary.
81.5% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,398 - 1,414
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #29 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,461
- Percentile
- 89.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: expert. Source rank: #37. Votes: 1263. Organization: alibaba. License: Proprietary.
89.8% percentile inside its fair comparison set1,461Raw benchmark valueCI 1,444 - 1,477
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #28 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,438
- Percentile
- 91.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_business_and_management_and_financial_operations. Source rank: #33. Votes: 5147. Organization: alibaba. License: Proprietary.
91.5% percentile inside its fair comparison set1,438Raw benchmark valueCI 1,429 - 1,446
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #43 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,404
- Percentile
- 87%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_entertainment_and_sports_and_media. Source rank: #54. Votes: 4951. Organization: alibaba. License: Proprietary.
87% percentile inside its fair comparison set1,404Raw benchmark valueCI 1,396 - 1,413
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #55 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,435
- Percentile
- 81.9%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_legal_and_government. Source rank: #67. Votes: 1738. Organization: alibaba. License: Proprietary.
81.9% percentile inside its fair comparison set1,435Raw benchmark valueCI 1,421 - 1,449
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #29 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,457
- Percentile
- 91.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_life_and_physical_and_social_science. Source rank: #35. Votes: 4207. Organization: alibaba. License: Proprietary.
91.3% percentile inside its fair comparison set1,457Raw benchmark valueCI 1,448 - 1,467
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #37 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,447
- Percentile
- 88.3%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_mathematical. Source rank: #44. Votes: 1313. Organization: alibaba. License: Proprietary.
88.3% percentile inside its fair comparison set1,447Raw benchmark valueCI 1,431 - 1,463
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #17 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,468
- Percentile
- 94.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_medicine_and_healthcare. Source rank: #20. Votes: 1458. Organization: alibaba. License: Proprietary.
94.6% percentile inside its fair comparison set1,468Raw benchmark valueCI 1,453 - 1,484
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #40 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,459
- Percentile
- 88%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_software_and_it_services. Source rank: #49. Votes: 9487. Organization: alibaba. License: Proprietary.
88% percentile inside its fair comparison set1,459Raw benchmark valueCI 1,452 - 1,465
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #56 · Source label: qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,407
- Percentile
- 83%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3-max-preview`. Category: industry_writing_and_literature_and_language. Source rank: #68. Votes: 6191. Organization: alibaba. License: Proprietary.
83% percentile inside its fair comparison set1,407Raw benchmark valueCI 1,399 - 1,414
Poker Agent
VALS-AI · Professional reasoning · Objective
Agent profit in poker-style strategic play.
Rank #16 · Source label: alibaba/qwen3-max-preview
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 994.5 score
- Percentile
- 6.3%
- Last updated
- archived
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: poker_agent; provider: unknown.
6.3% percentile inside its fair comparison set994.5 scoreRaw benchmark valueCI 994.5 score - 994.5 score