Prolific’s AI user experience leaderboard

Our leaderboard uses rich, reliable human feedback to rank AI models based on metrics that matter to real users.

With support from

Why Prolific’s AI user experience leaderboard matters

Representative human feedback from 500 diverse participants drawn from our pool of more than 200,000 verified users.
Real-world tasks mean models are tested on natural usage scenarios like email writing and trip planning, not abstract benchmarks.
Demographic insights for results segmented by user groups to reveal performance differences across populations.
THE SCIENCE BEHIND THE SCORES

Prolific’s AI user experience leaderboard combines behavioral science with reliable public opinion methods. Using stratified sampling, we recruit representative participants across key demographics like age, gender, ethnicity, education, and geography.

In each study, participants complete a set of standardized real-world tasks—such as email drafting, meal planning, trip organisation, and creative problem-solving—with randomly assigned AI models presented anonymously to avoid biases.

Models are evaluated on seven core dimensions including helpfulness, communication clarity, adaptiveness, understanding, trustworthiness, personality, and cultural alignment. Results are weighted through multilevel regression with poststratification (MRP) to create estimates that reflect broader population experiences.
 

Read the methodology

Discover the AI user experience leaderboard

See the data, engage with the leaderboard, and learn more below.