The complete human evaluation stack, now with MCP.

Connect your agent to the full Prolific evaluation stack.
Verified participants, structured results, straight into your pipeline.
Why Prolific

Verified humans, built into your evaluation pipeline.

No API code. No context switch. Your agent calls Prolific directly - studies launch, results return, and your pipeline keeps moving.

Your agent knows who responded
Every response carries cohort provenance — demographic filters, study conditions, and participant hash — baked into the data record. When a result comes back through MCP, you know exactly who gave you which signal.
Results land in the format your pipeline expects
Responses return as structured JSON and export as JSONL with stable cohort hashes. Idempotency on study creation makes retry safe from automated callers. No transformation step between the MCP response and your training run.
The pipeline was almost complete
The missing step was always the human one - collecting preferences, validating edge cases, catching what benchmarks miss. Now it's in the pipeline.
What customers say

"I want to remove any barrier between my agent and the results. Prolific does that."

Emerging Products Director: Fortune 500 software company

What your agent can do with one MCP install.

Launch preference studies mid-run
Your agent calls Prolific without breaking the training loop. Pairwise preference and Likert rating tasks run while your pipeline continues. Results return as JSONL, ready for RLHF and DPO — no copy-paste, no context switch.
Gate releases on human evaluation scores
Your agent recruits a participant cohort, waits for scores, and passes or fails the release — all in one pipeline step. No manual study setup. No dashboard check. Structured JSON comes back directly into the workflow.
Escalate to humans without leaving the workflow
When your agent hits ambiguity or low-confidence output, it calls Prolific directly. 68% of production agents require human intervention within 10 steps — MCP means that handoff happens inside the pipeline, not outside it. The response feeds back into the agent's context as structured JSON.
Run red-teaming as a pipeline step
Your agent recruits participants to probe for failures — bias, edge cases, judgment calls that benchmarks miss. Scriptable across model versions, against the HUMAINE evaluation framework. The kind of evaluation that used to require a manual study brief, now triggered in one tool call.
Get started

Start where your stack already lives.

MCP server
Works with Claude, Cursor, or any MCP-compatible client. Tools are discovered automatically — no SDK, no wrapper functions. Your agent decides when to call Prolific.
Install Prolific MCP
REST API
Studies, cohorts, responses, webhooks, idempotency semantics. Full programmatic control for custom integrations and training pipelines.
docs.prolific.com/api
CLI
Launch studies, wait on completion, export responses — scriptable in any pipeline. Works from shell, CI runner, or agent orchestrator.
docs.prolific.com/cli
Get started

Built for the teams at the frontier

The same participant network behind HUMAINE - a peer-reviewed evaluation benchmark - is now callable from inside your agent. Peer-reviewed methodology, accessible in one tool call.

HUMAINE — Unpacking human preference for LLMs
Technical benchmarks often lack real-world relevance. HUMAINE addresses unrepresentative sampling, superficial assessment, and single-metric reductionism in LLM evaluation.
Read the paper
Ai2 reduced data collection from weeks to hours
Allen Institute for AI built state-of-the-art multimodal models faster without sacrificing quality — using Prolific's verified human network at scale.
Read more
Gemini 3 Pro: Frontier safety framework
The frontier safety framework report for Google’s latest model.
Read more

Your training loop. Prolific's humans.

One install. Your agent calls humans. Your pipeline keeps moving.
FAQ

Common questions about the MCP server