Articles

How to improve data quality in online studies

Dr Andrew Gordon

|May 19, 2025

Online research has changed how we gather information. We can reach higher volumes of people, faster and more cost-effectively than ever before. An expansion like this, however, brings a significant challenge: how do we make sure the quality of the data collected is any good?

It's a concern for researchers. After all, what value does a study hold if its results can't be trusted? Concerns around data quality in online research aren't new, but they are evolving. Earlier studies suggested that up to 46% of opt-in panel respondents might be unreliable (Harmon Research and Grey Matter Research, 2020; PEW Research, 2019).

More recently, a 2024 study published in the Journal of Medical Internet Research found that 34% of participants admitted to using AI tools like ChatGPT to assist with answering open-ended survey questions.

While many did so to express themselves more clearly, the study also highlighted the risk of homogenous or overly polished responses that could obscure genuine insight. The saying "Garbage In, Garbage Out" rings more true than ever.

With these challenges in mind, we look at the steps to enhance your data quality and instill greater confidence in your findings.

Selecting the right platform

Selecting the right platform
When it comes to online research, there are a number of platforms available to help facilitate the researcher’s needs. But they’re not all created equally. Some options consistently produce high-quality data, while others may leave you with more questions than answers. When evaluating the best platforms for your needs, consider the following criteria:

Strong participant verification – How does the platform ensure participants are real people?
Continuous quality control – What checks are in place to monitor participant behaviour across studies?
A large, active participant base – Don’t just trust the marketing. Many online pools are mostly inactive, and small samples risk participant naivety.
Features that match your research – The platform should support the type of study you want to run.
Built-in AI response detection – As AI tools like ChatGPT become more common, features such as Prolific’s authenticity check can help safeguard your data integrity.

Recent comparative studies have assessed major platforms on key data quality metrics. Peer et al. (2022) found that Prolific outperformed other platforms on most measures, with MTurk lagging behind. A 2023 study published in PLOS ONE built on this, comparing Prolific, MTurk, CloudResearch, Qualtrics, and SONA. It found that participants from Prolific and CloudResearch were more likely to pass attention checks, follow instructions, provide meaningful answers, and have unique IP and geolocation data. Both also offered a more cost-effective rate—around $2.00 per high-quality respondent—compared to MTurk and Qualtrics.

Participant engagement is just as important. Douglas et al. (2023) found that Prolific delivered the lowest cost per quality respondent among major providers, including Qualtrics, SONA, MTurk, and CloudResearch.

It's also worth noting that different software is suited to different tasks. Tools like Qualtrics are ideal for running surveys, while Gorilla and Pavlovia are better for experimental designs. Again, Prolific stands out—offering flexible integrations, diverse participants, and compatibility with hundreds of services—making it easier to run both surveys and experiments effectively.

Trust, but verify

Once you've chosen your platform, the work isn't over. Take a proactive approach and verify that participants are who they claim to be.

Incorporate checks into your study design. Ask screening questions again to confirm accuracy, and keep demographic requirements confidential in the task description. Taking a subtle approach like this can help remove those attempting to manipulate the system.

Open-ended questions are invaluable in this process. They require participants to engage more deeply and provide insight into their thought processes. You should also be alert for inconsistencies in responses. For instance, if a participant claims to be a lifelong vegetarian but later mentions their favorite steak recipe, it may indicate unreliable data.

Consider verification as a method to uphold the integrity of your research, rather than a sign of mistrust. Implementing these checks means you're improving the quality of data while also showing respect for the participants who are genuinely committed to contributing to your study.

Designing for quality

A well-constructed study is your best defense against questionable data. Focus on getting the fundamentals right so everything else falls into place.

Include free-text responses

Always include at least one free-text response, though more is preferable. These are harder for low quality participants to bypass easily. Additionally, include duplicate questions with multiple options. Attentive participants should provide consistent answers across these questions.

Free-text responses not only help validate participant engagement but also allow you to catch potential automated or AI-generated answers. They're helpful for assessing the quality and depth of the thought participants are putting into your study.

As Veselovsky and colleagues (2023) point out, analyzing these responses can help you spot signs of large language model use, which is an emerging concern in online research.

Prolific’s authenticity check feature was also designed to significantly streamline the “bot” issue by automatically flagging suspicious AI-generated responses with 98.7% accuracy, reducing the manual workload and improving detection frequency.

Set appropriate filters

Ex-ante filters are pre-set criteria used to select participants for a study before it begins. When setting them, aim for a high approval rate—on Prolific, you can set this higher than 97% since we already remove most participants below this threshold. Look for participants with a history of completing 20 or more studies. If language fluency is necessary for your research, make sure to include this in your filtering criteria.

These filters, as highlighted by Tomczak et al. (2023), can significantly improve data quality. They help with recruiting participants who have a track record of providing reliable responses. However, be mindful that overly strict filters might limit your participant pool or introduce unintended biases.

Strike a balance between quality and inclusivity to get the best results for your specific research needs.

Monitor response timing

Collect data on the time participants spend on each page. Doing so helps identify those who may be multitasking during the study or not giving it their full attention. Don't rely too heavily on overall completion time. Individual page timing data gives you a much more accurate picture of participant engagement. The goal is to have dedicated participants, not those dividing their attention.

As Ramsey et al. (2023) point out, page timing data is a necessary tool for assessing response quality. It can help you spot both those who rush through your study and ones who take suspiciously long, possibly indicating distraction or consultation of external sources. When analyzing this data, look for patterns that align with the cognitive demands of your questions.

Remember that extremely fast or slow responses aren't always invalid, but they do warrant closer examination.

The art of attention checks

Attention checks are necessary checkpoints designed to see if participants are paying attention and properly engaging with the questions in your study.

Never rely on a single attention check. Even the most effective check isn't a perfect measure. Incorporate several throughout your study, but be cautious not to overdo it. Doing so can irritate participants and potentially reduce the quality of their input.

Keep your checks relevant to your study content. The goal is to assess current attention, not test prior knowledge or memory. Simple instructional manipulation checks like "What color is a red apple? Please select 'green' to show you're paying attention" aren't very effective, with research showing that participants can often pass these checks even when performing poorly on other quality measures.

Instead, use more reliable attention checks like non-sensical questions or bogus items, which better measure genuine participant attention. These methods are proven to be more reliable indicators of data quality and integrate naturally into your study flow.

Data analysis and quality control

Once your study is complete and responses are in, you’ll need to distinguish the valuable insights from the noise.

Start by examining completion times. Unusually quick completion might indicate participants aren't reading thoroughly. Don't just look at total time—analyze time spent on individual pages. Using page timing data can reveal important patterns about how participants are interacting with your study.

Again, free-text responses are invaluable for quality checking. Be wary of short, vague answers, and look out for copied text or responses that don't align with the question.

The age of AI also brings its own challenges: spotting machine-generated responses. Keep an eye out for unnaturally perfect grammar or responses that sound a bit too generic. There are tools out there to help with this, but they’re not always accurate. A keen eye is still your best defense. Prolific’s authenticity check provides clear visual indicators—green, red, or mixed bars—that show which free-text responses may have been AI-generated so you can easily assess data integrity with confidence.

Don't overlook the basics. Check for duplicate IP addresses, look for straight-lining (where participants select the same response option repeatedly), patterns of random clicking, and signs of low-effort responding.

That said, be cautious about relying on IP geolocation to make decisions about participant quality. It's not a reliable indicator of bad actors. Good participants may appear in unexpected locations if they’re using VPNs, university networks, or privacy tools like iCloud Private Relay. Some survey platforms can also misreport IP data. Rather than depending on location-based flags, lean on layered, privacy-conscious identity checks—like those Prolific has built into its platform—to more accurately verify who’s participating in your study.

While time-consuming, taking an investigative approach is necessary for guaranteeing the quality of your final dataset.

With these processes, you’ll focus on maintaining the integrity of your research while valuing the time and effort of your genuine participants.

Treating participants as individuals

Your participants aren't just data points; they’re real people. Treat them well, and they'll return the favor with high-quality responses.

One way to do this is by paying a fair rate. It's not just about ethics (though that's important too). Studies (Largent et al., 2022) show that fair pay boosts recruitment and retention. You get a larger pool of people who are more likely to engage fully with your study.

Keep your study engaging by varying stimuli and question types. A monotonous series of similar questions can lead to disengagement. Variety helps maintain participant interest and encourages thoughtful responses.

Approach your research with a positive mindset. Most participants genuinely want to contribute to your research's success and aren't intentionally providing poor data. Create an environment where participants feel valued and respected, and you'll likely see this reflected in the quality of their responses.

Practical application: A case study

Consider a public health researcher investigating the impact of social media on mental health across different age groups. They plan to survey 2,000 people in the US, ranging from teenagers to older adults.

After careful platform selection based on data quality metrics, they set up pre-screening criteria for US residents aged 18 and above, stratified into age groups. These criteria are kept confidential in the study description to prevent self-selection bias.

The study design combines various question types, including multiple-choice, Likert scales, and open-ended responses about social media habits and self-reported mental health indicators. Three subtle attention checks included in the survey, and question blocks are randomized to mitigate order effects. The researcher also implements page timing to track engagement levels throughout the study.

Following ethical payment guidelines, the researcher offers $12 per hour, guaranteeing fair compensation for participants' time and effort.

During data analysis, the researcher flags responses completed unusually quickly (more than one standard deviation below the mean completion time) for closer inspection. They use text analysis tools to identify potential AI-generated responses in the open-ended questions. Additionally, they cross-check responses to similar questions placed at different points in the survey to ensure consistency.

The resulting dataset provides a comprehensive, high-quality representation of social media's impact on mental health across US age groups. The researcher can confidently submit their findings to peer-reviewed journals and present recommendations to health policymakers, knowing their data can withstand rigorous scrutiny.

Looking ahead: Future trends in online research

The world of online research is always evolving. New challenges arise, but so do new solutions. Here's what to keep an eye on:

Advancements in AI detection

As AI-generated responses become more sophisticated, researchers need to stay ahead of the curve. Tools like Prolific’s authenticity check offer a behavioural-based approach to identifying AI assistance, and are part of a growing suite of innovations designed to protect the integrity of human insight in research.

High-quality data

Obtaining high-quality data in online research shouldn't be left to chance. It requires careful planning, execution, and analysis. By following these steps, you can enhance the reliability and validity of your research:

Choose your platform wisely, prioritizing quality.
Verify your participants while maintaining trust.
Design your study with built-in quality checks.
Conduct thorough data analysis and cleaning.
Treat participants fairly to encourage engagement.

Online research is powerful. With the right approach, you can harness that power to uncover insights that really make a difference. Armed with these strategies, you're well-equipped to tackle a wide range of research questions and contribute meaningfully to your field of study.

The goal isn't just to collect data, but to gather insights that can withstand scrutiny and drive informed decision-making. By prioritizing data quality at every stage of your research process, you're contributing to the advancement of knowledge in a responsible and impactful way as much as you’re simply conducting a study.

Enhancing your research with Prolific

Prolific is a leading online platform designed by researchers, for researchers. With over 200,000 active, vetted participants and a commitment to high-quality data, Prolific can help you elevate the quality of your online research:

Access a diverse, engaged participant pool verified through bank-grade ID checks.
Use 300+ demographic filters to find your ideal participants.
Benefit from continuous quality monitoring with 25 different algorithmic checks.
Automatically detect AI-generated responses with authenticity check—Prolific’s built-in detection feature that flags suspicious answers with 98.7% accuracy.
Design flexible studies using text, imagery, voice, or interaction.
Scale your research efficiently with API integration.

Experience the difference high-quality data can make in your research with Prolific

Share this post:

Articles

AI data scraping: ethics and data quality challenges

3 mins read

August 13, 2024