Using AI to Drive Survey Data Quality: A U.S. Benchmarking Study 

Editor’s note: This article is an automated speech-to-text transcription, edited lightly for clarity.

During the Quirk’s Virtual Sessions – Data Quality series on November 20, 2024, Dynata shared the results of a third-party study they were a part of that looked at the quality of data from different vendors.

Steven Millman, global head of research and data science at Dynata, also spoke about the key factors to consider related to survey data quality and the essential role of AI in enhancing data quality.

Session transcript:

Joe Rydholm

Hi everybody and welcome to our session “Using AI to Drive Survey Data Quality: A U.S. Benchmarking Study.”

I’m Quirk’s Editor, Joe Rydholm and before we get started let’s quickly go over the ways you can participate in today’s discussion. You can use the chat tab to interact with other attendees during the session and you can use the Q&A tab to submit questions for the presenters during the session and we will answer as many questions as we have time for during the Q&A portion.

Our session today is presented by Dynata. Enjoy the presentation!

Steven Millman

Hello and welcome to “Using AI to Drive Survey Data Quality: A Seven-Vendor Study” by Dynata.

My name is Steven Millman, I'm the global head of Research and data sciences at Dynata. I'm also a member of the board of directors of the Advertising Research Foundation, the ARF.

Today we're going to talk about the survey fraud problem, how we use artificial intelligence in support of data quality and then we'll talk about a third-party independent verification study that we did, which we're calling the ‘Seven-Vendor Quality Study.’ That study looked at things such as sample acceptance, representativeness, open-end quality and over qualification.

But let's start by talking about the survey fraud problem. We all know that bad data leads to bad decisions, leads to costly mistakes. 

As you can see, some studies have shown that bad data may cost organizations in the U.S. up to $12 million, $13 million a year in losses. And that advertisers waste up to 21% of their media budgets because of bad data.

So, data quality is critically important to driving success, import quality naturally impacts your timelines, your costs and ultimately your reputation.

There are two kinds of data quality that we really talk about. There is fraud and then there's disengagement. Most people just talk about fraud. 

Fraud is people taking surveys who are not who they say they are, who are motivated by money, who treat fraud as a job. It's also automation bots, AI, that are trying to take surveys as though they're real people. Ultimately, these are folks we don't under any circumstances want in our surveys.

Disengagement on the other hand, is a real person who might be interested in producing good survey results but is disengaged by the survey, and there's lots of reasons why someone could be disengaged. 

It could be that the survey is very long and unpleasant to take. It could be they had a really bad morning. It could be that they haven't had their coffee yet. It could be that they are confused or sick. But these are real people, they're just disengaged from the survey. You need to figure out where the line is between producing good enough quality and quality that's not good enough when we know it's a human.

This is where artificial intelligence comes in, it really is a game changer for data quality because it allows us to look at a wide variety of factors simultaneously in a way that a human really couldn't.

How do we manage this? 

Before we get into AI, let's talk about how we handle every element of data quality coming in, not just AI. 

We have multiple touchpoints with our panelists. Those include recruitment, how we sample, what surveys people take and how they take them and the conditions on redemption. We look at every step of this process to see if there's anything in there that suggests fraud or disengagement or anything else that might make these folks questionable to have in our sample. 

Machine learning leverages both customer and system feedback, both what we can see and what people tell us when they get the results. That helps us continuously improve surveys and data quality. We also have a large team of data scientists who take a holistic approach to monitoring, evaluating and taking actions to ensure the integrity of our survey results. 

But let's talk about quality score.

Quality score is Dynata’s artificial intelligence powered in survey quality solution. This uses machine learning to evaluate fraudulent and disengaged respondents against more than 175 unique data points. 

Everybody looks for certain kinds of problems in survey taking. They'll look to see whether people straight line, whether people take an uncharacteristically short period of time to answer a survey, but we go well beyond these kinds of metrics. I'll give you an example related to speeders, people who take the survey too quickly.

In most cases, sophisticated fraud today will look at the entire survey after it takes it, calculate how long it should have taken a human to do it and then stop and wait that long before submitting. We can detect that. 

Some sophisticated, not most, but some sophisticated fraud started doing the same thing. But on a page-by-page basis, we can detect that. 

We can also detect non-human keyboard activity, non-human mouse activity. Which is especially important when you're looking at things like business-to-business where they're very expensive, which attracts more fraud. Where someone might be asked a very technical question about, let's say, how would they implement a large system architecture change? 

Well, in a survey farm, for example, where people trying to fraudulently take surveys pretending to be, let's say IT decision makers, they might look that up, then cut and paste it into the open-end. We can detect cut and paste in the open-end. It's fully automated. It operates globally on multiple platforms and its source agnostic. 

When we are running a survey, even if we need to use samples from other folks, we can run that through quality score. And quality score ultimately gives you a set of probabilities that somebody is or is not going to be a respondent that you want to use. 

I've already talked about some of the things that are on this slide, but we look at active behaviors, we look at open-ends, we look at what's going on in the background, for example, machine latency, ping rates, geography verification and so forth.  

When we get feedback from you, we also feed that in. So, the system is continuously learning when you tell us, ‘Here's a list of IDs, we don't think that these IDs gave us good surveys, we don't want to use them.’ That gets fed back in and not only can we use that to detect new kinds of fraud that might be bantering into the system, but it also gives us history about the panelists and we know that if somebody is continuing to show up as bad that we're going to take those folks out. 

Finally, if somebody is an edge case, meaning we are relatively sure they're human, but we're not sure if they're sufficiently engaged. In these edge cases, we can then push that through to ID verification. We don't do it for everyone because there's no reason to do it for folks we know are real and working to produce good surveys. We don't need to do it when we know it's a bot or some kind of obvious fraud. But when we're not sure, a person gets routed to a government ID and if they don't do that, their survey doesn't get submitted and they don't get rewards.  

So this all sounds good because it works, and that was the key thing. To find out if it worked, we hired an independent third-party to evaluate a survey done by Dynata and by six other vendors.  

It's important to get external validation because everybody says that they have great quality. If you look at any person or any entity working in this space, everybody says that they have great quality. Don't just take our word for it, don't just take their word for it.  

This year we won I-COM's Data Creativity Award for quality score. We were first place both in data quality and overall, it's kind of like being the best gun dog and best in show. We're very proud of that.  

Just about a week ago, we were named first place in the most innovative field service supplier category of the GRIT Report.  

And we're the only survey provider to have ever been awarded Neutronian NQI Data Quality certification. We take this so seriously; we reached out for certification. But let's talk about the study. 

The study, importantly, was not conducted by Dynata. We did fund it.  

We found an independent third-party, Love + Soule Insights. If your names are Ed Love and Catherine (Cat) Armstrong Soule and you don't create a company called Love + Soule, I think you're doing something wrong.  

But these guys are also professors of marketing at Western Washington University. Ed is the department chair. They both have extensive history and experience working with and teaching survey design and analytics in the context of market research. They were great partners for us.  

What they did was a double-blind research study. What that means is they reached out to Dynata and six other vendors and one of them, Cat, collected the data and worked with the vendors. At no point were the vendors informed of the purpose of the study or that they were being compared or used in some kind of benchmark, kind of like a mystery shopping exercise.  

And then after Cat collected the data, she cleaned out anything that would allow her partner to know which vendor it was that had produced those results. She just passed it on as vendor one, vendor two or vendor three. Then Ed took those, and he evaluated them for quality, we'll talk about the quality evaluation in a second, without knowledge of who is who.  

Everybody got the same thing. They were asked to run a survey in five days with 1,000 U.S. respondents aged 18 to 34. The point of that was a general population study might be a little too simple, so we decided to make it a little narrower again, but I think it was a 22% incidence rate.  

There was a hard quota balancing gender and region, the four census regions. Each of the vendors needed to host and program the study. And each of the vendors were asked by Cat to use whatever normal procedures that they used to evaluate quality before it was sent to them. 

The six vendors were four first party survey panels, one of which is Dynata, two of them were sample exchanges and one was a sample aggregator.  

I should mention before I go forward, we chose a short five-to-seven-minute study and without a lot of fancy kinds of question types or anything complicated. The reason for this is that we wanted to create a survey that anybody doing a good job should be able to knock out of the park. So, this should be very easy for anyone to accomplish if they're doing things the right way.  

How did we do? Well, let's first talk about the dimensions of data quality that Ed used when he was evaluating the responses from each of the vendor surveys.  

We looked at sample acceptance, and that was would he have returned this respondent back to the vendor and say, ‘I don't want to pay for that. That looks terrible, I don't want to use it,” there were a series of criteria for that.  

Second, was representativeness. Looking at both what we asked them to do and what we didn't ask them to do. How representative was the sample to the population of interest? 

Three, open-end quality. That's pretty straightforward. Did they give rational intelligent responses to the questions that were asked?  

Then last was over qualification. And over qualification is the professional survey taker problem. Will people lie about a characteristic, thinking that it will get them in a survey in order to get rewards? Obviously, nobody wants that in their studies. 

So, let's start with sample acceptance. How did we do according to Love + Soule Insights.

Well, Dynata had the lowest sample rejection rates across all vendors and across all categories. 

Quick note on how to read these tables. The average of other vendors is everyone except for Dynata. So, the average acceptance rate across the other six was 92.3%, which means in a thousand studies, the average vendor had 87 that could not be used. First-party panels does not include Dynata it's the other three. And then you have the two exchanges and the aggregated. 

As you can see, we did quite well here, 99.8% acceptance rate for this short study. 

Our regular acceptance rate is 96%, that's across all studies. It includes the long studies, it includes heart to reach studies.  

When you put all that together, it's 96%. But as I mentioned, the intent here was to make this easy to do well, and Dynata and also the aggregator did quite well here.

Representativeness. So, as you remember, we only asked vendors to look at age to keep them in range as a screen and then to do a nested quota on gender and the four regions. And as you can see, we did quite well.  

We didn't ask them to balance on Hispanic, we didn't ask them to balance on other race or employment or income or home ownership. But because of the way that we collect our sample to make it representative through the use of loyalty partners and other means, we were the most representative of all the vendors in the space. 

First-party panels, generally speaking, did pretty well compared to aggregators and exchanges. The aggregators and exchanges actually did go relatively poorly, which I'll show you in the next slide.  

What you're looking at here is the average percent deviation from the U.S. Census. A larger number here means you were much farther away from being representative of the population, which in this case was U.S. 18 to 34.  

Dynata, by far, did the best. First-party panels did a little better, in general, than everybody else, but here's where the aggregators and the exchanges really did poorly. And that has to do with not having that first-party relationship with their panelists. First-party panels are going to know more about their respondents than anyone else.  

Open-end quality. Once again, Dynata had the highest quality in the open-end responses according to our third-party independent study. Looks a lot like the sample rejection, sample acceptance rates, that's not surprising. And again, Dynata and the aggregator did quite well. The exchanges in the first-party panels did significantly worse. 

According to Love + Soule, first-party panels, excluding Dynata, and the exchanges had a large number of bad responses, including responses in wrong languages, clear duplicates, lots of typos, very surface responses.  

And I should say within the first-party panels themselves, there was actually a great deal of variation. One was extraordinarily poor, one was better. But none were as good as Dynata or the sample aggregator where there were much higher quality open-end responses, more complete sentences, fewer typos and more thoughtful answers.  

Then last, we're going to look at over qualification. 

In this case, we used a veiled qualifier, meaning rather than just saying, and we did this in the study just for fun, but we asked people if they were pilots, if they had FAA pilots licenses, yes or no. That's an obvious screener and will attract more fraud. We used, and what we'll talk about here, is a veiled screener.  

The veiled screener is intended to prevent that kind of over qualification fraud. The way it works is you say, ‘which of the following, six or seven things represents you’ and the participants don't know which one is the one we're looking for to get them into the survey. Of course, if you select all of them, they're designed so that that would be a nonsensical solution and would get you kicked out. 

We have some advantages here as well. Because our panel is well profiled, we can generally send the survey to the people who are likely to get through the screener. That reduces the likelihood of over qualification because we're not sending a whole bunch of people in there randomly thinking maybe they belong, maybe they don't.  

We also have machine learning in our router that helps push surveys to respondents most likely to qualify. And again, as I mentioned, we do recommend a veiled screener approach. 

So, how did we do in the study? 

Well, we yielded the lowest over qualification rates, which we're very happy about.  

The question was, ‘which of the following have you done in the last three months?’ And the item along that list that we were looking for was registered as a personal trainer. The actual percentage of personal trainers in the SAGE cohort is less than half of a percent.  

So, as you can see, we did quite well. Aggregators did a little worse and the exchanges and first-parties did much more poorly.  

When we used the obvious screen, it was a little worse. And hilariously, there was one vendor where something like, 20% of their sample said that they were pilots. Something crazy like that. As you can see, the veiled screener does a much better job on that. 

In conclusion, we recommend you do this for your own vendors. If you are looking at multiple survey providers that you're using, compare and contrast, see how we're doing, how they're doing.  

But in this independent third-party study, looking at a relatively straightforward survey, the kind that we get all the time, our third-party said that we had the best sample acceptance rates, the highest representativeness, best open-end quality, and the least over qualification. 

Thank you so much for listening to the presentation. I'm looking forward to having a conversation with you when we do the round table. Thank you.