Case Studies

How a leading AI research organization ran one of the largest AI behavioral experiments

George Denison

|April 28, 2026

A leading AI safety research organization used Prolific to recruit over 76,000 participants for a landmark study into AI persuasion. With Prolific's managed services, expert study design consultation, and dynamic recruitment strategies, they ran more than 91,000 AI conversations across 19 models and 700+ topics in just nine weeks.

This produced the most comprehensive evidence base to date on AI persuasion and its risks. The research was subsequently published in a leading peer-reviewed journal.

The challenge: Running behavioral experiments at unprecedented scale

A prominent AI safety research organization needed to understand how politically persuasive conversational AI can be. The stakes were high, as policymakers and the public need robust, large-scale evidence to inform decisions about AI regulation and deployment.

This wasn't a typical survey or annotation task. The experiment required participants to sustain multi-turn, real-time conversations with AI systems across hundreds of topics and complex experimental conditions. Three core challenges made this especially difficult:

Scale without compromising quality

The experiment required tens of thousands of engaged, high-quality UK-based participants, not just people clicking through. Each participant had to genuinely engage with AI in live conversation, meaning low-effort or inattentive responses would compromise the entire dataset.

Data reliability across complex conditions

The experimental design involved randomization across multiple AI models and hundreds of topics. This meant the team needed reliable, high-quality data across every experimental condition.

High retention across multiple sessions

A significant re-contact element was key to the design, meaning participants needed to return for follow-up sessions. The integrity of the dataset depended on achieving high retention rates across tens of thousands of people.

The solution: Managed services, dynamic recruitment, and expert consultation

Prolific gave the research team participant access, recruitment expertise, and hands-on operational support to run one of the largest AI behavioral experiments ever conducted.

Mass-scale recruitment from a dedicated UK pool

The study required UK-based participants exclusively. Prolific's large, pre-verified UK participant pool provided the foundation, but the sheer volume of demand meant the team had to go further.

So, they pulled from Prolific’s waitlist to expand the available UK pool and ensure a steady flow of eligible participants throughout the nine-week collection period.

Expert study design consultation

Prolific's managed services team reviewed and advised on the study design to optimize the participant experience. Prolific's consultation ensured the studies were structured to maximize uptake, minimize drop-off, and provide participants with a clear, manageable experience.

Dynamic recruitment management

Prolific's team actively tracked submissions against targets throughout the project and adjusted recruitment strategies in real time. This included sending targeted participant messages, advertising the study to boost visibility, and tweaking study descriptions to optimize uptake, all in response to how recruitment was tracking day by day.

Hands-on participant communication and retention

With a significant re-contact element built into the experimental design, keeping participants engaged across multiple sessions was critical. Prolific's team handled all participant communication, responding to questions and proactively sending reminders to drive high retention and re-contact rates.

This responsive, ongoing relationship management was key to ensuring the dataset remained complete and balanced across conditions.

Full operational delivery

Prolific managed platform setup, participant communication, and recruitment logistics end-to-end. This freed the research team to focus on the science rather than the operational complexity of running studies at this scale.

The results: A groundbreaking dataset on AI persuasion

In just nine weeks, the collaboration produced results at a scale rarely seen in behavioral research:

76,000+ participants recruited, enabling one of the largest AI behavioral experiments ever conducted
91,000+ AI conversations completed across experimental conditions
466,000+ fact-checked claims generated from the conversations
A groundbreaking dataset forming the most comprehensive evidence base to date on AI persuasion and its risks
Publication in a leading peer-reviewed journal

Why this matters for AI safety research

As AI systems become more capable of influencing human beliefs and behavior, rigorous large-scale evidence is essential.

This project shows that ambitious behavioral experiments on AI don't have to be compromised by recruitment bottlenecks. With the right infrastructure and dynamic hands-on support, research teams can produce policy-relevant evidence at scale and speed.

Need to run complex AI behavioral research at scale? Prolific can help you recruit engaged participants and manage sophisticated experimental designs from end to end. Find out more.

Client anonymized at their request due to confidentiality reasons.

Share this post:

Articles

Data quality and AI safety: 4 ways bad data affects AI and how to avoid it

May 29, 2025