Visible tradeoffsThis is a human preference signal, so it tells you what people liked side by side, not what is formally correct.
source
Artificial Analysis
metric
Arena rating (rating)
judge
Human
direction
higher better
group id
aa_text_to_speech_current
domain
Audio dialogue / speech
What it measures vs what it misses
✓ Measures
Observed user preference over generated speech samples.
✗ Misses
Speech-to-text accuracy, dialogue behavior, and API integration ergonomics.
Why this countsObserved user preference over generated speech samples.Same-test ruleThis percentile only compares models inside the exact benchmark/version group shown here. It is not a universal score.What it missesSpeech-to-text accuracy, dialogue behavior, and API integration ergonomics.