Interviews

How Google DeepMind is advancing multi-party AI research with Deliberate Lab

Jasmehr Bhatia
|February 10, 2026

Large language models aren't just individual assistants anymore. They're moving into spaces where groups deliberate, negotiate, and decide together, from workplace collaboration to public policy discussions.

This shift raises urgent questions: How do AI agents influence group consensus? Do they amplify human biases or correct them? Can they help diverse groups find common ground, or do they steamroll nuanced debate?

These aren't abstract concerns. As Crystal Qian, Senior Research Scientist at Google DeepMind's People + AI Research (PAIR) team, puts it: "Large language models are so pervasive in society now and touch so many people. We needed to study a broader cross-section of participants to understand how AI actually shapes collective outcomes."

Crystal leads research on human-AI (HAI) interaction and how LLMs can improve or distort social dynamics. Her recent work includes simulating voting patterns in group elections, evaluating how LLM assistance affects bargaining outcomes, and developing scalable methods to assess AI-mediated consensus-building.

But there's a problem: the tools to study these dynamics barely exist.

Why real-time, multi-party HAI research is so hard

Studying how humans and AI interact in groups isn't like running a survey. It requires coordinating multiple people in real time, integrating AI agents seamlessly into group tasks, and capturing nuanced behavioral data as interactions unfold.

Traditional research platforms are built for asynchronous, single-user tasks. Getting multiple people online simultaneously typically means manual email coordination, scheduled group video calls, and hoping everyone shows up. Surveys scale but lack nuance. In-person focus groups offer depth but don't scale. For studying group dynamics with AI, you need both.

Even if you solve coordination, you still need to build the platform itself: real-time chat systems, AI agent integration, structured data capture, attrition management (one dropout might kill an entire group's data), and flexible experiment design without requiring custom code for every study.

"We reached out to teams that had done canonical research in real-time group scenarios with AI," Crystal explains. "It was all bespoke custom tooling. Significant software engineering lifts. Deploying your own models, hosting your own servers, which made it hard to reproduce and hard for folks from interdisciplinary backgrounds to build these things."

Most research teams don't have months and dedicated engineers to build this infrastructure. So they simply don't run these studies.

The solution: A platform for multi-party Human-AI experiments

Frustrated by these barriers, Crystal's team at Google DeepMind PAIR built Deliberate Lab: an open-source platform for real-time, multi-party experiments with humans and AI agents.

"We built this with AI-first participants and facilitators," Crystal explains. "But since we've open-sourced the platform, we've seen a lot of interesting use cases, like folks studying AI as teachers or companions, testing model checkpoints, even use cases without AI where people just like the real-time multi-party aspect."

What Deliberate Lab enables

  1. No-code experiment design: Researchers can build complex, multi-stage experiments through an intuitive interface, no programming required. Create chat stages, surveys, elections, consensus tasks, and more by dragging modular components into sequence.
  2. Video game-style lobbies for coordination: Inspired by online gaming, Deliberate Lab uses lobby systems to solve the synchronous coordination problem. Participants queue up, wait briefly for others, then transfer into groups automatically. No email scheduling. No manual coordination.
  3. AI agents as first-class participants: LLMs can join as participants (blending into groups) or mediators (facilitating discussions). Researchers control agent behavior through modular prompts, structured outputs, and response throttling to match human conversation pace.
  4. Real-time monitoring and intervention: Experimenters see live status indicators showing which participants are active, stuck, or dropping off. They can send attention checks, transfer participants between groups mid-session, or message individuals to resolve confusion.
  5. Seamless data export: All interaction data (chat transcripts, survey responses, voting patterns, and timestamps) exports as structured JSON and CSV files ready for analysis.

Where Prolific fits: trust, scale, and speed

Every one of these studies required hundreds (sometimes thousands) of thoughtful, engaged participants who could handle complex, real-time tasks. Crystal's team needed a partner who could deliver both quality and scale.

Why participant quality is non-negotiable

Deliberate Lab experiments aren't simple surveys. They require:

  • Sustained attention for 20-60 minute sessions
  • Nuanced reasoning (spotting factual errors, evaluating arguments, deliberating trade-offs)
  • Real-time coordination with other participants
  • Authentic engagement 

In group research, one bad participant doesn't create one bad data point. It destroys the data from all four participants in that group - and when you're running 250 simultaneous cohorts, quality issues cascade fast.

How Prolific delivers quality and speed

  • Verified, engaged participants
    Prolific's 200,000+ participant pool undergoes rigorous screening with over 50 identity and quality checks. Continuous fraud detection catches bad actors quickly.
  • Rapid recruitment without sacrificing quality
    "We've used Prolific to run studies with thousands," Crystal notes. The platform's speed enables ambitious research timelines without the months-long coordination traditional methods require.
  • Precision recruitment for complex research designs
    Research questions often require specific participant characteristics or carefully balanced group compositions. Prolific's extensive filtering capabilities and demographic data enable researchers to recruit exactly who they need, when they need them.

What this means for the future of AI research

Deliberate Lab represents a shift in how we study AI's societal impact. Instead of isolated benchmarks or simulated agents, researchers can now observe how real people and AI systems interact in complex social contexts at scale.

"We wanted to collaborate with folks in sociology, behavioral sciences," Crystal explains. "We needed an interface where we could work on these things without the technical overhead of deploying these systems."

The platform is already enabling research that would have been dismissed as too complex just two years ago: AI teachers in group learning environments, real-time expert adjudication with collaborative labeling, AI moderation of contentious debates, and remote focus groups with embedded AI facilitators.

For Prolific, partnerships like this demonstrate our commitment to advancing frontier AI research. Not just providing participants, but working alongside leading researchers at institutions like Google DeepMind to to deliver breakthroughs in human-AI research.

"I know lots of other teams at DeepMind are big fans of your tool," Crystal shares.

Watch the full conversation

Hear Crystal discuss the technical challenges of real-time human-AI research, the design decisions behind Deliberate Lab, and her vision for how AI can improve or undermine collective decision-making.

Partner with Prolific for frontier AI research

If you're studying human-AI interaction at scale, Prolific provides verified participants who deliver quality data for complex, synchronous experiments.

Learn more about research with Prolific →