LegalBench
VALS-AI · Professional reasoning · Objective
Academic legal reasoning tasks.
Rank #19 · Source label: alibaba/qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 84.3%
- Percentile
- 80%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: legal_bench; provider: Alibaba.
80% percentile inside its fair comparison set84.3%Raw benchmark valueCI 83.5% - 85.1%
TaxEval v2
VALS-AI · Professional reasoning · Objective
Answer quality on tax questions and responses.
Rank #45 · Source label: alibaba/qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 72.2%
- Percentile
- 51.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: tax_eval_v2; provider: Alibaba.
51.6% percentile inside its fair comparison set72.2%Raw benchmark valueCI 70.4% - 73.9%
MedCode
VALS-AI · Professional reasoning · Objective
Medical billing support and coding tasks.
Rank #44 · Source label: alibaba/qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 33%
- Percentile
- 15.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medcode; provider: Alibaba.
15.7% percentile inside its fair comparison set33%Raw benchmark valueCI 29.5% - 36.5%
MedScribe
VALS-AI · Professional reasoning · Objective
Administrative documentation support for doctors.
Rank #44 · Source label: alibaba/qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 70.6%
- Percentile
- 14%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: medscribe; provider: Alibaba.
14% percentile inside its fair comparison set70.6%Raw benchmark valueCI 66.5% - 74.7%
Text Arena · Expert
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #95 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,418
- Percentile
- 65.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: expert. Source rank: #116. Votes: 3487. Organization: alibaba. License: Proprietary.
65.8% percentile inside its fair comparison set1,418Raw benchmark valueCI 1,408 - 1,429
Text Arena · Industry Business And Management And Financial Operations
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #92 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,403
- Percentile
- 71.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_business_and_management_and_financial_operations. Source rank: #111. Votes: 8004. Organization: alibaba. License: Proprietary.
71.4% percentile inside its fair comparison set1,403Raw benchmark valueCI 1,396 - 1,411
Text Arena · Industry Entertainment And Sports And Media
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #117 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,345
- Percentile
- 64.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #144. Votes: 7911. Organization: alibaba. License: Proprietary.
64.1% percentile inside its fair comparison set1,345Raw benchmark valueCI 1,337 - 1,352
Text Arena · Industry Legal And Government
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #106 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,399
- Percentile
- 64.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_legal_and_government. Source rank: #129. Votes: 3024. Organization: alibaba. License: Proprietary.
64.8% percentile inside its fair comparison set1,399Raw benchmark valueCI 1,387 - 1,410
Text Arena · Industry Life And Physical And Social Science
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #91 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,420
- Percentile
- 72.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_life_and_physical_and_social_science. Source rank: #113. Votes: 6394. Organization: alibaba. License: Proprietary.
72.1% percentile inside its fair comparison set1,420Raw benchmark valueCI 1,412 - 1,428
Text Arena · Industry Mathematical
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #97 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,409
- Percentile
- 68.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_mathematical. Source rank: #118. Votes: 2142. Organization: alibaba. License: Proprietary.
68.8% percentile inside its fair comparison set1,409Raw benchmark valueCI 1,395 - 1,422
Text Arena · Industry Medicine And Healthcare
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #108 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,410
- Percentile
- 63.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_medicine_and_healthcare. Source rank: #131. Votes: 2811. Organization: alibaba. License: Proprietary.
63.7% percentile inside its fair comparison set1,410Raw benchmark valueCI 1,399 - 1,422
Text Arena · Industry Software And It Services
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #106 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,427
- Percentile
- 67.7%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_software_and_it_services. Source rank: #126. Votes: 16469. Organization: alibaba. License: Proprietary.
67.7% percentile inside its fair comparison set1,427Raw benchmark valueCI 1,422 - 1,433
Text Arena · Industry Writing And Literature And Language
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #106 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,369
- Percentile
- 67.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_writing_and_literature_and_language. Source rank: #131. Votes: 9349. Organization: alibaba. License: Proprietary.
67.6% percentile inside its fair comparison set1,369Raw benchmark valueCI 1,362 - 1,376
Text Arena · Expert · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena expert leaderboard.
Rank #86 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,407
- Percentile
- 69.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: expert. Source rank: #104. Votes: 3487. Organization: alibaba. License: Proprietary.
69.1% percentile inside its fair comparison set1,407Raw benchmark valueCI 1,396 - 1,417
Text Arena · Industry Business And Management And Financial Operations · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_business_and_management_and_financial_operations leaderboard.
Rank #85 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,396
- Percentile
- 73.6%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_business_and_management_and_financial_operations. Source rank: #102. Votes: 8004. Organization: alibaba. License: Proprietary.
73.6% percentile inside its fair comparison set1,396Raw benchmark valueCI 1,389 - 1,403
Text Arena · Industry Entertainment And Sports And Media · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_entertainment_and_sports_and_media leaderboard.
Rank #105 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,348
- Percentile
- 67.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_entertainment_and_sports_and_media. Source rank: #128. Votes: 7911. Organization: alibaba. License: Proprietary.
67.8% percentile inside its fair comparison set1,348Raw benchmark valueCI 1,340 - 1,355
Text Arena · Industry Legal And Government · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_legal_and_government leaderboard.
Rank #98 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,399
- Percentile
- 67.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_legal_and_government. Source rank: #119. Votes: 3024. Organization: alibaba. License: Proprietary.
67.4% percentile inside its fair comparison set1,399Raw benchmark valueCI 1,388 - 1,410
Text Arena · Industry Life And Physical And Social Science · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_life_and_physical_and_social_science leaderboard.
Rank #93 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,417
- Percentile
- 71.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_life_and_physical_and_social_science. Source rank: #109. Votes: 6394. Organization: alibaba. License: Proprietary.
71.5% percentile inside its fair comparison set1,417Raw benchmark valueCI 1,409 - 1,425
Text Arena · Industry Mathematical · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_mathematical leaderboard.
Rank #91 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,409
- Percentile
- 70.8%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_mathematical. Source rank: #108. Votes: 2142. Organization: alibaba. License: Proprietary.
70.8% percentile inside its fair comparison set1,409Raw benchmark valueCI 1,396 - 1,423
Text Arena · Industry Medicine And Healthcare · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_medicine_and_healthcare leaderboard.
Rank #97 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,406
- Percentile
- 67.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_medicine_and_healthcare. Source rank: #118. Votes: 2811. Organization: alibaba. License: Proprietary.
67.5% percentile inside its fair comparison set1,406Raw benchmark valueCI 1,394 - 1,418
Text Arena · Industry Software And It Services · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_software_and_it_services leaderboard.
Rank #97 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,414
- Percentile
- 70.5%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_software_and_it_services. Source rank: #115. Votes: 16469. Organization: alibaba. License: Proprietary.
70.5% percentile inside its fair comparison set1,414Raw benchmark valueCI 1,408 - 1,419
Text Arena · Industry Writing And Literature And Language · No Style Control
AR · Professional reasoning · Human
Observed user preference in Arena's Text Arena industry_writing_and_literature_and_language leaderboard.
Rank #98 · Source label: qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Arena
- Raw value
- 1,370
- Percentile
- 70.1%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Arena leaderboard dataset row `qwen3.5-flash`. Category: industry_writing_and_literature_and_language. Source rank: #120. Votes: 9349. Organization: alibaba. License: Proprietary.
70.1% percentile inside its fair comparison set1,370Raw benchmark valueCI 1,363 - 1,377
SAGE
VALS-AI · Professional reasoning · Objective
Student Assessment with Generative Evaluation.
Rank #26 · Source label: alibaba/qwen3.5-flash
verified runtimeexact aliasBackground only
Raw row drilldownsource row, percentile, last updated, eligibility
- Source
- Vals AI
- Raw value
- 42.5%
- Percentile
- 44.4%
- Last updated
- recent
- Eligibility
- preview_model
Parsed from Vals AI BenchmarkView overall scores. Vals slug: sage; provider: Alibaba.
44.4% percentile inside its fair comparison set42.5%Raw benchmark valueCI 35.7% - 49.3%