Articles

The complete guide to improving data quality in online surveys

George Denison
|February 20, 2026

Whether you're conducting academic research, informing a business decision, or developing an AI model, online surveys are a valuable source of insights. They help you draw meaningful conclusions and identify when you need to rethink your approach. But how can you tell if you're getting high-quality data?

You need to be able to spot indicators of bad data quality and warning signs of poor-quality survey responses. Otherwise, you risk relying on flawed data that leads you to inaccurate conclusions, wasting resources, and potentially undermining the credibility of your work.

What is data quality?

Data quality is a measure of whether data is fit for its intended purpose. It measures whether data will yield usable, accurate insights across several factors, including completeness and validity. 

Several common characteristics can help you identify high-quality data. When you collect data via online surveys, there are additional factors specific to this data collection process that can affect your data quality. 

Learn more: The most important factors relating to data quality from online surveys, based on insights from 129 experienced research professionals

 

Comprehension

In online surveys, comprehension means whether or not respondents understand what you’re asking them. It’s an essential factor in data quality because if survey respondents don’t understand the questions, their responses won’t be accurate or honest.

Researchers often add instructions and a couple of preliminary questions at the start of the survey to ensure the respondent understands them. Even if they pass your comprehension check, you may still get low-quality responses from your participants if your survey design is difficult to understand.

Some ways your survey can affect comprehension include how you word your questions and the rating scales for responses. For example, researchers should make sure their questions are worded clearly to ensure there’s no room for misinterpretation when respondents answer them.

Attention

Attention is whether your participants are engaged with your survey and paying attention to what you’re asking, rather than multitasking while watching Netflix. Respondents’ attention levels have a huge impact on the quality of responses you get to your online surveys because if they’re not paying attention, they’re more likely to misread questions or give minimal responses.

You can add attention-check questions to your survey to assess whether participants are fully engaged. Also, if you monitor how long respondents spend on your survey and on each question, you can spot when they’re paying attention compared with when they’re skimming through each question as quickly as possible.

Honesty

In online surveys, honesty refers to whether participants give truthful responses. It’s an essential part of data quality because dishonest responses skew your data and can lead you to draw misleading conclusions.

Sometimes, respondents may be tempted to tell you what they think you want to hear. For example, if a survey about workplace productivity asks how many times you visit social media sites each day, they may be tempted to respond with a lower number to appear more engaged with their work. 

Alternatively, you may be asking personal questions about private aspects of the respondents’ lives. If they feel like you’re prying, or if they don’t trust you’ll handle their responses confidentially, respondents are unlikely to be honest.

A growing concern about honesty is the use of AI tools like ChatGPT to generate free-text responses. Even if a participant isn't deliberately trying to deceive you, outsourcing their answers to an AI means you're no longer collecting authentic human insights, which can undermine the purpose of primary research.

Accuracy

Accuracy is whether the data you collect matches reality. In online surveys, participant attention levels, comprehension, and honesty affect the accuracy of their responses. Respondents can accidentally enter inaccurate information if they don’t understand or are not paying attention, or deliberately provide misleading responses if they're answering surveys dishonestly.

Accuracy is essential for data quality because inaccurate data skews your analysis. For example, if enough respondents choose the wrong age range (25-30 rather than 35-40), then your analysis of responses across age ranges will be skewed by those incorrect answers.

Completeness

Completeness is whether your data is complete, with all records present and fields filled in. In online surveys, completeness is measured by the number of questions respondents answer, which can be affected by participants not paying attention or not understanding the questions.

It impacts your data quality because incomplete data means you don't get all the information you need from survey respondents. For example, your survey data can be affected by completeness if respondents skip questions or leave one-word answers to open-ended questions. At best, it can leave you with gaps in your understanding. At worst, it can skew your data if some of your questions go unanswered by a key subset of your target customers.

Consistency

Consistency is whether a participant answers the same question in the same way at different times in your survey. It affects your data quality because you can’t tell which response is accurate.

Seemingly small inconsistencies can affect the accuracy of the responses you collect. For example, if you ask about employment status, and the respondent chooses “employed full time” and “self-employed full time” at different stages of your survey, you can’t tell which truly represents their work situation.

Reliability

Reliability is closely related to consistency, but it’s largely a psychological construct rather than a specific characteristic that your survey participants display. A reliable respondent will answer a survey the same way, no matter when they answer it or if they answer the same questions multiple times.  

It affects your data quality if you’re tracking and comparing results over time or running the same survey multiple times. Several factors can affect the reliability of your data, including changes in participants, the timing of your survey, and changes to the survey itself.

Naivety

Naive survey respondents have no experience with your survey topic, questions, or the process of completing online research. 

Participant naivety can affect your data quality on two fronts. First, if a respondent is already familiar with your research process or topic, their prior knowledge may affect how they complete the survey and introduce bias into their responses. And second, if you have a truly naive respondent, it’s hard to know how trustworthy responses are and whether they understand the survey.

Representativeness

Representativeness measures how well your data sample represents your target population. If your survey responses don’t come from a representative sample, it can affect data quality by introducing bias.

For example, if a researcher is studying attitudes towards remote working, they want responses that reflect a broad range of experiences and demographics. But if the majority of responses come from a single age group or profession, the findings won't be representative of the wider population they're trying to understand.

Thoroughness

Thoroughness is the detail and depth participants bring to their answers to open-ended questions. It affects data quality because thorough responses are typically more detailed, yielding more insights and information to review from your respondents.

Some participants may provide only the bare minimum when responding to an open-ended question, with just one or two sentences. But others may provide a paragraph or two, taking the time to fully explain their opinion or provide additional details.

Timeliness

Timeliness measures whether the data you’ve collected has been analyzed and used within an appropriate timeframe. It can affect your data quality because if there’s a long gap between collecting your data and analyzing those responses, the insights you gain may be out of date. For example, if you collect data on how people interact with a particular technology but don't analyze it for six months, the landscape may have shifted significantly in the interim, making your findings less relevant or harder to contextualize.

Timeliness can be affected by your data handling and processing times, as well as the types of data you collect. Some data types are more affected by time delays than others. For example, if you ask respondents their date of birth, it won't change, but if you ask their age, it will.

Uniqueness

Uniqueness is whether there are duplications in your data. In online surveys, you should look out for the same person completing your survey multiple times, as well as duplicated responses from multiple participants. 

Duplicate survey responses can indicate fraudulent responses, such as in this example, where the researcher received the majority of their responses from bots. However, duplicate responses aren’t always of poor quality. For example, if you have many multiple-choice questions, you're likely to see more similar responses than for open-ended questions.

The uniqueness of your responses will depend on the type of data you collect and the questions you ask. For example, if you ask respondents what state or town they live in, it’s unlikely you’ll get all unique responses, but that’s not a problem.

Validity

Validity means that the questions you are asking actually measure the thing you are interested in. It’s affected by the wording of your survey questions and the format of responses.

For example, if you’re running a survey to discover the factors that influence employee engagement, then asking to what extent the respondents agree or disagree with the statement “I enjoy my work” won’t help you understand that.

Invalid data negatively affects your data quality because it doesn’t relate to what you’re trying to measure or understand.
 

Why is data quality important?

If you’re spending a lot of time and money collecting survey responses, it’s important that you get high-quality responses to justify your investment. The quality of the data you get from online surveys and research directly affects the value and insights you can gain from that research.

It affects the integrity of your findings

Poor-quality data from online surveys can lead to inaccurate conclusions and misleading patterns in your results. When decisions - whether academic, commercial, or policy-related - are based on that data, they rest on a flawed foundation, despite the best intentions of the researcher.

Conversely, high-quality survey responses give you confidence in your findings. They allow you to draw conclusions that are well-supported by evidence, whether you're publishing research or making strategic decisions.

It affects the value of your investment

Poor quality data from online surveys wastes the time and resources spent collecting it. It can also mean that valid insights get buried in noise, or that flawed findings lead to costly missteps, whether that's a study that can't be published, a product decision that misses the mark, or a dataset that introduces bias into an AI model.

High-quality data, on the other hand, is a worthwhile investment. It produces findings that are robust, reproducible, and genuinely useful.
 

How to improve data quality from online survey responses

When you’re collecting data from online surveys, you can do several things to boost the quality of data you get from your research participants. These steps will help keep participants engaged with your survey and encourage them to provide honest, accurate, and detailed responses.

Set up pre-screening requirements to recruit high-quality participants

Pre-screening requirements are a helpful tool for ensuring that the only people completing your survey fit the profile of your target audience. Many online research platforms allow you to pre-select different criteria your participants need to meet in order to qualify for your survey, such as:

  • Age
  • Gender identity
  • Languages spoken fluently
  • Employment status
  • Lifestyle and interests

Pre-screening requirements help you get your survey in front of relevant people who meet the criteria you set. There are a couple of ways to approach them. You can set up pre-screening options for your online surveys based on many demographic criteria, so only people who meet those criteria can join your survey.

Phrase questions to keep respondents engaged

The wording of your survey questions helps keep respondents engaged - or makes them “zone out” when going through your survey. If you ask several similarly phrased or structured questions in a row, it can cause a type of bias called habituation. This happens when survey respondents get used to the kinds of survey questions you’re asking, so they skim the questions (and their responses) rather than paying close attention.

To avoid habituation and keep respondents engaged, write varied questions and use different question types to create an interesting experience for respondents completing your survey. For example, rather than having a run of questions that all use a 1-5 rating scale, you could add a multiple-choice or open-ended question to stop respondents from skimming through the questions.

Add attention check questions (ACQs)

Attention-check questions (ACQs) help you determine whether your survey participants are really paying attention to what you're asking or just skimming. They assess respondents' engagement with your survey and improve data quality by screening out disengaged participants.

Add ACQs at one (or more) points in your surveys to help you identify disengaged participants. An ACQ instructs respondents to answer a question in a specific way to check whether they’ve paid attention to it. It shouldn’t leave room for interpretation. For example:

When asked for your favorite color, you must select green. This is an attention check.

What is your favorite color?

  • Blue
  • Red
  • Orange
  • Green
  • Yellow

Participants who fail at least two ACQs (for surveys more than five minutes long) should be disqualified from your survey to keep your quality of respondents high. However, some studies have shown that overusing ACQs can negatively affect data quality, so you shouldn’t rely on them as your sole method for boosting the quality of your survey responses.

Use authenticity checks to detect AI-generated responses and bots

As AI tools like ChatGPT become more accessible, a growing threat to survey data quality is participants using AI to generate their free-text responses. This can undermine the authenticity of your data, and neither is always easy to spot by reading responses alone.

Prolific's authenticity checks use behavioral analysis rather than content analysis to detect these threats in real time, and are available at no extra cost:

LLM checks detect participants using AI to answer free-text questions, achieving 98.7% precision and a false-positive rate of just 0.6%. They analyze 15 behavioral signals, such as copy-pasting and tab-switching, to flag potentially inauthentic responses. These are available for studies using Qualtrics or Prolific's AI TaskBuilder.

Bot checks identify AI agents or fully automated participants with 100% accuracy in testing. They look for non-human behaviors across all question types, not just free-text. These are available for studies using Qualtrics.

Both types of checks surface flagged responses directly in your submissions page, so you can review and sort them without manual inspection. Responses flagged with low authenticity can be rejected, saving you significant time while protecting the integrity of your data.

One of the simplest additional steps is to remind participants not to use AI directly within your survey questions. Internal testing found that this alone reduced AI usage by 61%. Learn more about best practices around authenticity checks.

Make personal questions optional (and anonymous)

To keep your responses honest and maintain high-quality answers, make personal questions optional if they’re not integral to your research. Many research platforms (including Prolific) don’t allow you to collect personally identifiable information, which maintains the privacy and anonymity of your respondents.

Personal questions can be a sticking point for survey respondents if they don’t feel comfortable answering them. This can lead them to provide false or nonsensical responses so they can move on to the next part of your survey.

At the start of the survey, inform participants that their responses will be anonymous. Then remind them again when you get to the personal or sensitive questions. This will build trust and help them feel more comfortable sharing personal or potentially sensitive information.

Compensate respondents fairly for their time

One of the best ways to recruit high-quality survey respondents is to pay them fairly for their time. Doing so incentivizes participants to provide high-quality responses and to take their time answering your survey fully, rather than rushing to complete it as quickly as possible.

In the past, it was thought that paying survey respondents would negatively impact their responses and skew your data (as respondents would give the answers they thought you wanted). However, this is an outdated view, and almost all research participants are compensated for their time. Pre-screening respondents and adding attention checks can mitigate that risk, allowing you to recognize the valuable contribution your respondents make to your survey.

High data quality starts with high-quality research participants

The best-planned research in the world will still produce poor-quality data if all your responses come from bots or disengaged respondents. Recruiting high-quality participants and using the right tools to verify their authenticity will lay the foundation for collecting high-quality survey data. Learn more about the factors that matter most when recruiting high-quality survey participants, whether you’re using Prolific or any other online research platform.

Log in or sign up to Prolific today and start collecting high-quality human data in minutes.