HUMAINE: the human-centered AI leaderboard

See how frontier models perform in real-world use—measured by real people, not just technical benchmarks.

Discover the leaderboard

In partnership with

Beyond Technical Evals: Why Human-Centered AI Benchmarks Matter

Comparative by design

Models are evaluated side by side, so differences are clear and meaningful—not hidden in abstract rating scales.

Multi-dimensional metrics

Performance is measured across reasoning, communication style, core task performance, adaptiveness, trust, ethics, and safety.

Statistically rigorous

Model comparisons are dictated via TrueSkill tournament design and rankings via hierarchical Bayesian modelling, ensuring reliable, unbiased results.

THE HUMAN EXPERIENCE BEHIND THE RANKINGS

HUMAINE is a human-preference leaderboard that evaluates frontier AI models based on real-world usage. Unlike traditional benchmarks that mainly track technical performance, HUMAINE captures how diverse users actually experience AI—across everyday tasks, trust and safety, adaptability, and more.

By combining rigorous methodology with feedback from a representative pool of real people, HUMAINE offers the insights model creators and evaluators need to understand not just which model performs best, but why. Updated regularly, it provides a dynamic view of model strengths, weaknesses, and user satisfaction.

Explore the leaderboard

See how leading AI models stack up in real-world use.

Explore HUMAINE on Hugging Face

HUMAINE Leaderboard FAQs

Who is it for?

HUMAINE is designed for AI labs, model creators, and evaluators who want to understand how their models perform in real-world contexts. It’s also useful for researchers, policy makers, and practitioners interested in human-centered AI evaluation.

How often is it updated?

Results are refreshed as new models are added and new data is collected. This ensures the leaderboard reflects the latest performance trends across the AI ecosystem.

Where can I find out more?

You can explore the full HUMAINE leaderboard data—including demographic and task-level analysis—on our dedicated app. Datasets are available here; our announcement blog and a detailed paper will follow soon.

Why Prolific?

Prolific brings deep expertise in human-centered research and access to a diverse, representative pool of verified participants. This ensures that evaluations are fair, reliable, and grounded in real user experience.

What is an AI evaluation leaderboard?

An AI evaluation leaderboard ranks artificial intelligence models based on specific benchmarks. The HUMAINE leaderboard is unique because it uses statistically rigorous, multi-dimensional human feedback to measure real-world performance—like reasoning, trust, and communication style—rather than just technical test scores.

What is the best AI model right now?

Based on the HUMAINE human-centered leaderboard, Google's Gemini 3 Pro is currently the overall winning AI model. Evaluated by real people across diverse demographics, it consistently ranks number one for overall preference, outperforming other frontier models in core task performance.