Spotting the Invisible: Cultural Considerations in Fraud Prevention
Editor’s note: This article is an automated speech-to-text transcription, edited lightly for clarity.
During the Quirk’s Virtual Sessions – Data Quality series on November 20, 2024, Toluna and Cloud Research shared some insightful findings when it comes to catching fraudulent survey respondents.
Marie Hense, global head of quality at Toluna and Leib Litman, chief research officer at Cloud Research will give tips on spotting fraudulent participants across markets. The two will also talk about how Sentry is a great tool for catching click farms and other fraudulent respondents.
Session transcript:
Joe Rydholm
Hi everybody and welcome to our presentation “Spotting the Invisible: Cultural Considerations in Fraud Prevention.”
I’m Quirk’s Editor, Joe Rydholm – thanks for joining us today. Just a quick reminder that you can use the chat tab if you’d like to interact with other attendees during today’s discussion. And you can use the Q&A tab to submit questions to the presenters and we will get to as many as we have time for at the end.
Our session today is presented by Toluna. Marie, take it away!
Marie Hense
Brilliant. Thank you so much for the introduction and great to have so many people on this webinar today.
First of all, I wanted to introduce you really quickly to who the two of us are and why we're presenting together today. With me today, I've got Leib, who works for Cloud Research.
Cloud Research and Toluna have been working really closely on fraud detection for the last one and a half years. So, today what we would love to do is share the learnings from those 18 months with all of you, especially when it comes to cultural nuances that have to do with effective fraud detection.
We'll start off by introducing Sentry, one of Cloud Research’s key quality tools. In the second half of the webinar, we'll talk about how we applied it in our global fraud detection and what kind of learnings we had in specific markets on how to make sure that our fraud detection is as effective and efficient as possible.
Leib Litman
Okay, thank you Marie. Good morning, everybody, or good afternoon depending on where you may be. I'll start today by taking you behind the scenes of one of the biggest challenges facing online research today, which is systematic fraud.
I'll show you where fraud is coming from, and you may find some of this surprising. Actually, I'll show you some real examples of how this fraud operates and the impact of fraud on major business decisions and public policy. Then most importantly, I'll talk about how we can effectively detect and prevent online fraud. From there we can transition to a discussion of the cultural considerations in fraud prevention.
Online research has become ubiquitous in our society with approximately 3 billion surveys completed every year. These surveys inform everything from political polls and public opinion research to market research, medical studies and the social sciences.
The problem, however, is that this field faces a critical challenge, which is widespread fraud.
How widespread is the fraud? Well, there've been many studies conducted recently to measure the scale of survey fraud in online research, and they paint a pretty grim picture.
For example, a study conducted several years ago by Case, showed that 30% to 40% of online survey data is fraudulent. And this study was conducted across four separate online platforms.
At Cloud Research, we also regularly conduct studies to measure the prevalence of online fraud and to understand where fraud is coming from.
In our most recent study, we collected data from 2,500 people, which is what you see depicted here from three commonly used online platforms. The rate of fraud across these platforms that we detected ranged between 18% and 33%. Fairly close to what Case found and pretty close to what others have found as well.
In order to effectively combat fraud in online research, we first need to understand where it's coming from. So, if there are 3 billion surveys being completed a year and a third of them are fraudulent, talking about 1 billion fraudulent surveys. That's a lot.
What's the source of that? Many people immediately assume that the primary source of fraud is bots. The truth is, yes, bots are responsible for some of the fraud, but in reality, the majority of fraudulent responses actually come from real human beings. These people operate on what we call ‘click farms’ that are situated across the world, across many different countries.
One of the ways that we know that this is the case is that we invite fraudulent participants that we flagged in our studies including this study to video interviews to observe them in action. I'll share some of these videos with you right now.
Once we flag someone as fraudulent in a survey, we send them a Zoom link and we invite them to take a survey on a Zoom video call. Here is what we typically see.
Just to give you a little bit more context here, again, this participant completed a survey with us. We saw within the survey that they're likely to be very fraudulent. So we get in touch with them. We send them a Zoom link, they click on a link. As soon as we open the video, this is what we see. Then we observe them, and we ask them to fill out a regular market research survey, which I'll show you in a second, the way they fill it out.
In order to learn, not only about the context of where the fraud takes place, but also about the content; What are the patterns of fraudulent respondents that we can observe?
Let me play this video for you so you can see what a typical survey click farm actually looks like.
What we see is a room with dozens of computers with workers continuously completing surveys. There are dozens of computers, there are dozens of people, they're coming in and out of the room, they're collaborating, they're sharing techniques, they talk to each other, they get up, they sit down. These survey farms are almost always not in the United States, even though this particular survey was supposed to only be open for people who are in the United States. When we talk to them, they don't speak English either at all or not very well.
They use various apps that we can see. They're using them to access remote servers. This way they can appear as though they're in the United States. They can basically appear to be in whatever country they want, and they use various apps to get around IP blocking software and so forth.
Then these click farms also have a very strong presence on social media.
For example, you can look into this yourself if you go on YouTube. There are YouTube channels. One channel from Bangladesh has over 50,000 followers, this YouTube channel puts out videos every single day that give people tips for how to circumvent safety checks in online panels.
For example, they show how to pose as a U.S. citizen by using rented U.S. cell phone numbers and how to use proxy servers and U.S. IP addresses. There are videos that show how to take on and create a fraudulent identity.
For example, if you want to pose as a black woman in the United States, there are a series of fake driver's licenses that people can rent and use, you would never know that they're not who they appear to be.
Now what makes these fraudulent respondents particularly insidious, is their consistent pattern of responses that we refer to as ‘yeah-saying,’ which is just a strong bias toward positive responses. Let me show you an example.
This is the same participant. They're filling out a market research survey, so just regular questions that you would see in a typical survey. But within those questions, there are questions that look at fraud. One of these questions asks whether the participant recently purchased a home in McMullen in Alabama.
Now McMullen in Alabama has a population of 29 people. We chose that location on purpose because we know that the likelihood of anybody purchasing a home in that town is virtually zero. But as you can see here, this respondent said ‘yes, I recently purchased a home there.’
Not only this respondent, but when we look across our survey of 2,500 participants, 417 actually claimed to have recently purchased a home in McMullen, Alabama. You can see that if this was a real research question, it would be vastly inflated, and you would get a completely erroneous conclusions based on this. Also think about if this was a product that nobody ever used, but from this would be perceived as having been used by a lot of people.
Let me show you some other examples. This is exactly the same participant who is going to be looking at other questions.
In this question, we presented a fictitious set of products. There are four fictitious products, and we asked them, which ones do you use? They just checked off the top three of those products.
We asked them, have you in the last three months filed a homeowner's insurance claim due to damage to lightning? And they said yes.
We asked them, have you recently attended a conference in Austin, Texas in May of this year? They say, yes.
Do you own a Tesla? Yes.
Have you had your house completely repainted in the last seven days? Yes.
Overall, what we see with this respondent and with all fraudulent respondents is this pattern of consistently saying ‘yes’ indiscriminately to everything.
They will say yes, they'll report owning products that don't exist. They'll report using fictitious services, they'll report experiences, highly improbable events and they'll also exaggerate their responses to real products in a way that doesn't really match reality and doesn't match the pattern of real participants.
Now, why do they do this? This pattern of yeah-saying is not accidental. Fraudulent participants have adopted a yeah-saying strategy because they've been conditioned to do that. The reason is because surveys routinely route out respondents out of surveys when they don't qualify.
So, if you are doing a survey on Air Jordans or whatever product and there's a question, do you use that product? If you say no, they know that this is what the survey's probably about. If you say you're not using it, you're going to get routed out of that survey, you're not going to get paid.
So, they're conditioned to maximize their earnings by basically saying yes to everything. This creates a more likely opportunity for them to qualify for as many surveys as possible.
Now, this systematic positivity bias has serious consequences, and I want to share two examples with you.
The first example was presented at Quirk’s a couple of years ago by Tia Maurer from Proctor and Gamble. You can download the full report, if you're interested in more details from the Lucy website, from the URL here (registration required).
A major Crest mouthwash campaign failed because online research showed that 54% of participants used the product, were aware of the product and liked the product, but in-person testing revealed that it was only 24%.
They spent millions of dollars on a go to market campaign. They found out that people hated this product that people supposedly loved. They're trying to figure out what's going on. They went back to the drawing board, did the study in person and found something completely different.
All of this was caused entirely by this yeah-saying tendency.
The second study I'll share with you was one that was conducted by our team, and this was at the height of COVID-19 where the Center for Disease Control did an online study, which found that 10% of Americans were drinking bleach to prevent COVID-19.
Maybe some of you came across this study, but it was extremely widely disseminated across media all around the world. But, as it turned out, once you remove fraudulent participants, there's not a single case of legitimate bleach drinking that was found.
All these problems in these studies and dozens of others that I could tell you about if we had more time, are caused by yeah-saying of fraudulent participants. People that just indiscriminately say that they're familiar with products, that they use a product. They say, ‘yes, I drink bleach,’ yes to everything.
And what this does is it creates illusory effects that exaggerate frequency claims. It creates illusory correlations, and it leads to wrong conclusions that lead to the loss of hundreds of millions of dollars.
How can this type of fraud be prevented?
We at Cloud Research developed a solution to survey fraud. It's called Sentry.
Sentry is a 30-second prescreening system, which combines two key components. The first component is one that most people are familiar with. It includes IP deduplication, geolocation tracking, basic digital fingerprints. It really is aimed at devices.
We're looking at the device of the participant at these digital fingerprints and we're trying to flag problematic signals. Now the problem is that these click farms are so sophisticated that they know how to get around all of these devices, so while you can catch some people with it, you'll miss most of them.
That's why Sentry includes a second component, which is the most important component because what it does is focuses on specifically those behavioral patterns that we know to be characteristic of fraudulent participants.
We have libraries of tens of thousands of questions that are pulled into this 30-second system. These questions pull for honesty and attention. They're also open-ended verification of open-ended questions.
Through that we look at not just device, but we also look at behavior. We supplement that behavioral analysis with other techniques, some of which I'll share with you in a second.
This here is what a Sentry experience looks like. I'll just play that for you in a second.
This is a participant who was routed from a panel to Sentry before starting the actual survey. You'll see Sentry has these questions, some of which I shared with you a few minutes ago, that pull for acquiescence bias and yeah-saying.
For example, ‘Are you living in Roscoville, Alabama at the moment?’ Either this town doesn't exist at all or it just has a couple of people who live there and look at what happens. Not only does this person say ‘yes, they live there,’ but they also use a translation device which is detected by this event streamer software that we have that not only looks at what people say in response to these questions, but how they ask these questions whenever they use apps. We can actually detect them automatically and these people are routed out of the survey.
Here are some other examples where you can see people who routinely use these translation devices. I'll share with you one other question that we have here.
This question asks, ‘From memory, can you think of every word in the dictionary that begins with the letter “p”?’
Anybody who's paying attention would say no to that. But as you can see, this person goes out of their way to say yes because they think they're qualifying for a study. They don't even read the question. They just look at where the ‘yes’ answer is.
Then another example of what Sentry does is it looks at open-ended responses using AI.
For example, we'll ask participants to write one sentence about the last thing they remember cooking and where they cooked it. Again, people use translation software to see this question, to understand it and then you can see them clicking around, you can see them going off screen.
Eventually, what ends up happening is they click a button. They have software that uses AI to answer this question. You can see that because they're typing at speed, that's just humanly impossible, but the event streamer catches it. That's how we catch ChatGPT usage and AI usage in open-ended responses.
The last thing I'll just say, and maybe a little bit later, is one of the ways that we know that Sentry is routing out the right people and that Sentry is really catching people who are fraudulent, is we use a benchmarking technique that we developed, which we believe is very effective and it works very differently from attention checks and those kinds of approaches.
The way that benchmarking techniques work is that you basically take some event that you know the probability of it in the population. For example, Tesla ownership, we know that it's around 1% in the U.S. So, if you have people who are being honest in a sample, you'll get 1%, 2%, maybe 3%, 4%. But the people who Sentry routes out, they report Tesla ownership at 50% and sometimes higher levels.
That's one of the ways that we know that it's routing out the right people and the people that it keeps are honest overall.
I'll just end by saying that the challenge of online research fraud is significant, but it's not taking fraud seriously that carries significant risk of drawing false conclusions that mislead business decisions. But by understanding the patterns of fraud and implementing robust screening methods, we could actually maintain the integrity of online research.
At the same time, we have to make sure that whatever methods we use, have cultural validity. This is increasingly important as more research is conducted across culturally and linguistically diverse markets to address this very important topic.
Now, I’ll hand it back to Marie.
Marie Hense
Brilliant, thank you so much, Leib. Let's dive into those cultural learnings that we've had from the last 18 months.
As said already, when it comes to fraud detection and when it comes to identifying suspicious respondents, there are certain patterns that we can see, like overstatement and usage of auto translators, for example.
However, when we look at certain markets in the world, we know that there will automatically be markets, based on cultural preferences or based on cultural makeup, where these kinds of behaviors will be more prevalent naturally. Even when taking fraud out of the equation.
So, very interestingly, over the last 18 months we've been working together with the team at Cloud Research on implementing Sentry as one of our checks in our suite of checks at Toluna. We actually saw this in our data, and we were able to identify certain markets where certain types of behavior are more prevalent than in others. Those are exactly the learnings that we'd love to share with you today.
We already talked about why it's so important to identify fraudulent respondents correctly.
Well, the first thing is, of course, you want to remove the fraudsters. But we also want to reduce the chance of removing real respondents because if we remove real respondents from our research that just by chance happen to answer in a certain way or use a certain tool, we may skew and bias to our data by pre-selecting respondents.
The second point is respondent experience.
If we continuously remove real respondents that are there to genuinely participate in research and we remove them based on our quality checks, they will have a bad experience and they will not come back at some point. That means we'll have fewer real and genuine respondents in the long run, which just makes the opportunity for fraudsters worse.
The last point is the more real and genuine respondents we remove through non-optimized quality checks, the longer our fieldwork timelines are going to be, the higher the cost of finding our samples. So overall, being accurate in our quality checks has lots of benefits.
Let's start by looking at translator usage because this was a really interesting one.
One thing to say, first, we always recommend removing respondents who use auto translators. A, because we saw a lot of fraudsters use them, but B, because auto translators, even when used genuinely are not necessarily great at translating those nuances in questionnaire or answer options.
So, we really don't recommend having auto translators in surveys. However, we've got to make a decision when we see someone using an auto translator.
Should we just prevent them from entering the survey and participating or should we actually block them permanently right away because we think they're fraudsters? Well, let's have a look at the data that we've got.
This data I pulled specifically from the last two months because that was enough to see exactly the same trends that we have been seeing. In September and October of this year, we indexed all of the markets that we were running the auto translator usage checks in. We could see in the middle we've got our norm, but then to the right, we can see a few markets that are really much higher than the average market that we were looking at.
When we look and zoom in on those markets, we can see we've got Jordan in there, UAE, Bosnian and Herzegovina and Qatar. Doing a little bit of desk research, we can see why those markets would have a higher rate of translator usage.
In Jordan, based on United Nations data, we can see the number of international migrants is estimated at about 33% of the country’s total population. In Qatar, this is even higher, about 95% of the total labor force are migrant labor workers. In the UAE, bout 88% of the total population is made up of migrants. We can see in Bosnia and Herzegovina, for example, there are three official languages with quite a high percentage of people who speak those.
What we can see immediately in those four countries just based on migration and culture and different mixes of languages, there would be potentially a reason that people are using auto translators at a higher rate than in other markets.
So, everything I'm saying does not mean automatically that those people aren’t fraudsters. There may still be fraudsters in there. What I'm saying is we need more data. This cannot be a single check on which we make a decision in these markets because we can see there are genuine cultural reasons why this data may look that way.
Let's have a look at extreme bias. This is very interesting because behavior and agreement are certain factors that can be heavily influenced by culture.
In general, we did see that extreme bias in some markets is generally a very effective way of identifying fraudulent respondents. But, if we look on the right there of that graph, there are some cultures where disagreement or not knowing or admitting that you don't know something can actually be seen as personal weakness or losing face. So, we thought, let's have a look at those countries there on the very far right.
In there we've got Vietnam, Panama, South Africa, Indonesia, Ecuador, Honduras, Philippines and Kenya. It's really interesting there when we imagine the world map, we can already see that some of these markets are actually quite close to each other.
So, if we look at Vietnam, the Philippines and Indonesia for example, they're all very much in the same corner of the world. Same goes with Panama, Ecuador and Honduras also very close together.
So, a hypothesis of saying, ‘well, there could be something cultural in those areas,’ isn't too farfetched. Actually, when we look at some cultural research, there is something behind that.
On the left here, I just took this from a book called “The Culture Map” by Erin Meyer, which I highly recommend when it comes to understanding different cultures. In that book, different cultures are separated based on eight different dimensions, including how people evaluate, how people communicate, how people agree or disagree and how confrontational they are.
What's really interesting when looking at these factors and these dimensions and at the countries from the previous slide, we can actually see for Indonesia and Kenya, which both had a high index of extreme bias in comparison to other countries. Both countries, despite being quite far away geographically, are both high context communicators, which means they don't say things directly. The context is very important. They both culturally give negative feedback indirectly, and they both have a high confrontation avoidance.
Based on those factors, they could be a factor or an impact on how people respond to certain types of bias questions or when being presented with a direct question like, “have you been here, have you done that?” They could see that as quite a direct question and saying no or disagreeing could be seen as undesirable in those cultures.
Then lastly, I thought I'll bring another check to the table, which I always think is very interesting, which is IP geolocation.
IP geolocation basically uses databases to map certain IP addresses to geographic locations. And these databases come from different sources, but essentially in some countries these are better than others because there is a limited number of IP addresses in the world. So, these are allocated to different countries.
When we look at the U.S., we've got a population of about 342 million and we've got about 1.2 billion IP addresses allocated to the U.S., as a country. In theory, that means about 3.6 IP addresses per person. If we imagine every person has two internet enabled devices that could roughly work out to their individual IP addresses per device.
However, if we look at Germany, for example, we can see there is a population of 84 million and only 135 available IP addresses. That means there's only about 1.6 addresses per person. If every person in Germany had two internet enabled devices, that means those IP addresses now have to be shared and there cannot be one IP address per device anymore.
Now, look at China, there's a population of 1.4 billion and there are only 351 million IP addresses. What we can really see there is IP addresses need to be shared between devices because there aren't enough IP addresses for every single device. What is the result of that?
In China, for example, they use dynamic IP allocation, which means that a device right now could have an IP address that's based in Shanghai, but 30 minutes from now it could be allocated to an IP address that's based in Beijing. That doesn't mean the person, or the device actually traveled from Shanghai to Beijing. It's just based on the availability of IP addresses that this person was allocated a different IP address.
What we can see from that immediately is IP geolocation in China will be and is in fact much less reliable than in the U.S., for example. In fact, in the U.S., if we look at IP geolocation, there is an accuracy of about 95 to 99% of a user's country, 55 to 80% accuracy for the user’s state or region. But even in the U.S., with so many IP addresses available in relation to the population, there's only an accuracy of about 50 to 75% for a user’s city.
What that means is we need to really pay attention to checks such as IPG location, which may work slightly better in some markets, but definitely do not work in other markets such as China. So there are limitations based on the market and based on the culture that we are looking at.
To leave you with some top tips for cross culture quality screening, we will wrap up what we talked about. Leib, do you want to take it away with some general recommendations?
Leib Litman
Sure, yeah, thank you Marie.
I think that what we've seen is that it's really important to validate all cultural checks.
When we translate, for example, our questions into different languages and we have over 50 different languages available, we always validate them on the culture in question. We can't just use questions from that we validate on let's say a U.S. sample, translate them and then hope that they work in China or Russia or somewhere else. But that's not enough, as Marie was saying, right?
The other thing I would say is benchmarks are a really good way to see whether people really are being honest or not.
Again, taking the Tesla example, whatever screening device is being used, whether it's Sentry or something else, at the end of the day, if people who fail it then in the survey itself report owning a Mercedes at an 80% or a 70% rate, there's something going on with that sample, right? Or owning a Tesla at 50% rate. But the Tesla question and the 1% benchmark is true for the U.S.
It's very important to use benchmarks that are appropriate for each specific culture. Both the stimuli that are used to vet participants and the benchmarks that I use to see whether the vetting works, have to be culturally specific.
Marie Hense
Yeah, absolutely agree.
I think the other thing to add there is as well, when you're benchmarking, don't just benchmark the U.S. against the U.K. or Germany against France. Also benchmark within that market to see whether the removal rates and the flag rates of your quality checks are similar across studies for that market. Because otherwise one market may look much higher than the other, but that doesn't automatically mean that there is more fraud there. It may be something that's based on the quality check that was being applied or based on the way that people understood that quality check. So, that's really important.
The other thing that's important from a device checks perspective, as we saw IP geolocation does not necessarily work in every market. So, while it's a good check in some markets, in other markets it really does not help. It will create a lot of false positives, it will flag a lot of respondents, not because they're in any way suspicious, but because of the way that certain systems work in that country.
The same goes for the usage of auto translators.
Just because someone is using an auto translator does not mean they're automatically a fraudster. In some markets it just means they're an immigrant or it is a market with a lot of different languages. We need to ensure also for representative reasons that those different languages are being catered to.
Another thing that we saw in the behavior checks, especially with those cultures where saving face is important. We want to ensure that when designing trap questions, we're mindful of cultural differences.
For example, if a culture is quite agreeable or doesn't know how to handle not knowing something, it's really important to remember that those are concepts that do exist in certain markets.
Also, another small notice to localize checks, English in the U.S. isn't the same as English in the U.K. and so on. That can lead to misunderstandings and therefore higher quality checks.
The last recommendation we've got is around identity checks. This is a topic that's very important to us at Toluna at the moment. We're doing a lot of trials and checks around that.
Always remember that the sense of privacy in different markets can differ quite significantly. Just because someone doesn't want to upload their ID doesn't mean that they they're a fraudster, it could just mean that they are living in a country where sharing personal information is considered very sensitive.
That's another little learning on the side that we've been having over the last few months.
With that, we will hand it over to all of you for questions.