Separating myth from reality

Editor's note: Karine Pepin is the co-founder of The Research Heads. CASE4Quality is a brand-led coalition created to ensure a quality foundation for marketing data intelligence. E-mail info@case4quality.com for more information. Mary Beth Weber is the founder of CASE and a senior account executive at Lucy by Capacity. Efrain Ribeiro is an online research consultant. Tia Maurer is R&D group scientist at Procter & Gamble. Carrie Campbell is former head of data and analytics at Ketchum and National Geographic.

The online panel landscape has undergone a seismic change over the past two decades – and not for the betterment of our industry. With the introduction of new sampling technologies, the underlying methodologies used to recruit participants have fundamentally changed, leading to the commoditization of sample. Once dominated by well-managed double-opt-in panels, the sampling ecosystem has devolved into a hotbed of fraudulent activity and low-quality data which prioritizes volume, speed and cost. These structural shifts, largely unnoticed by client-side researchers, have resulted in unintended consequences for research quality.

This article aims to empower client-side researchers by exposing the realities of the industry and providing them with the knowledge to demand transparency and accountability. By understanding the true state of the sampling ecosystem, researchers can advocate for higher quality standards and drive positive change.

Myth 1: Suppliers have validated and verified their members to ensure real, unique and authentic participants for my survey.

At the turn of the century, circa 2000, the initial online panels adopted tried-and-true offline methods to establish a solid foundation. This involved gathering key identifying information and profiling details to verify the authenticity of recruited participants. This process was primarily used to match respondents with appropriate surveys and to ensure incentive checks were accurately delivered to their physical mailboxes. Additionally, it served as a means to validate respondents through third-party databases.

By 2010, most firms had transitioned to digital incentives and stopped collecting personally identifiable information, as it significantly reduced the number of willing participants. Today, in most cases, participants create a panel profile with just an e-mail address. Few, if any, panels require proof of identity or conduct checks to ensure authenticity. 

“Most sample providers run almost no checks on their respondents as they sign up to the panel or throughout the lifetime of the respondent on their panel. The onus is instead on you, the researcher, to make sure that you are building in sufficient checks to your study.” – Andrew Gordon, Prolific

Although methods exist to detect and prevent fraud – such as tracking payment methods or mailing incentives to physical addresses – control over these measures lies with the suppliers, not the researchers. 

To minimize the chances of participants misrepresenting themselves in your study, researchers can implement several design-related strategies. These include: crafting a robust screener questionnaire that effectively masks the research topic; using custom recruits with identity verification for B2B and hard-to-reach audiences; using panels that track digital behaviors and location data; and incorporating audio or video questions to assess participant authenticity.

Myth 2: Suppliers have effective processes in place to limit respondent participation, thus ensuring that my survey is sent to fresh participants.

In the past, some suppliers imposed limits on how many surveys a respondent could complete within a given time frame. Invitations were targeted at those who met specific screener criteria, and large CPG clients expected a minimum three-week period in which respondents would not receive surveys in the same category (e.g., beverages, haircare, cereals). This practice helped ensure the integrity of the research.

Today, those standards have largely disappeared. Most panels now rely on a self-selection model. Potential participants can browse available surveys on offer walls, reviewing the topic, time commitment and incentive before deciding to participate. There is no limit on the number of surveys a participant can complete per day. 

“How many surveys is too many? What about 21.8 survey attempts per day? This is the average number of survey attempts per survey entrant we captured across 26,000+ survey entrants on a study we ran earlier this year for research-on-research purposes.”1 – Marc Di Gaspero, Potloc

Further highlighting this issue, a CASE4Quality study found that a small subset of devices accounts for a significant portion of survey completions (3% of devices completed 19% of all surveys). Even more alarming, 40% of the devices entering over 100 surveys per day successfully passed all other quality checks.2 Research from CASE and others show that frequent survey takers can skew results. A higher number of survey attempts is linked to lower brand awareness, higher brand ratings and higher purchase intent, demonstrating how these respondents can distort overall findings.3

Using third-party fraud detection software can help researchers mitigate this issue. While these tools primarily monitor participant behavior within their user base and may not detect all high-frequency survey takers, they offer a valuable layer of protection. 

Myth 3: Suppliers have relationships with their members and only use their proprietary sample for my study.

In the early 2000s, sample recruitment strategies differed quite a bit from today’s highly optimized approaches. Companies like e-Rewards primarily relied on partnerships with established brands and loyalty programs to build their panels. Members were invited to join through e-mails, newsletters or co-branded partnerships with airlines, hotels and other loyalty programs. This approach led to mostly exclusive sample pools for each supplier. 

Today’s sample landscape is significantly more complex. While some traditional methods still exist, suppliers now employ a diverse range of recruitment strategies to meet the increasing demand for sample (e.g. affiliate networks, mobile/gaming platforms, online traffic sources, programmatic algorithms and so on). Each of these sources introduces potential vulnerabilities for fraud and disengaged respondents. 

“While many suppliers may promote their proprietary panels, most have transitioned into an aggregation model, sourcing from various providers to meet quotas, timelines and budget constraints. In my previous role as VP of partner network and quality, I closely monitored frequent survey takers and the overlap of respondents across panels. Over the past 4-5 years, duplication rates have significantly increased – not just because more people are joining multiple panels but due to suppliers blending sources to scale. As a result, researchers cannot confidently trust the origin of their sample without rigorous partner-vetting and building strong relationships with suppliers that demonstrate transparency and reliability.” – Mary Draper, EMI Research Solutions 

A key change is the breakdown of sample exclusivity. Most suppliers now act as aggregators, even those touting their proprietary panel, by sourcing sample from various providers.4 Without transparency and accountability, it’s hard to see any differences in quality. 

To ensure the integrity of their data, client-side researchers must proactively take ownership of the data quality process, even if suppliers implement basic fraud prevention measures. This involves demanding transparency from suppliers regarding sample sources, using their own fraud detection systems and implementing rigorous quality control protocols before, during and after the survey. 

Myth 4: Fraud is easily identified and removed. 

While traditional bots are a known factor in data quality issues, human-assisted fraud poses an even greater challenge.5 This type of fraud ranges from large-scale operations like click farms to smaller-scale efforts by individuals with malicious intent, as evidenced by the uncovering of the Paid For Your Say group.6 Additionally, poor data quality can also stem from honest yet disengaged respondents.

“Survey fraud has evolved from simple bots used over a decade ago to more advanced methods that mix human input with browser extensions and form-fill technology. Although the goal of claiming survey incentives remains the same, the focus has shifted from just completing as many surveys as quickly as possible to ensuring that (fraudulent) survey completions are accepted.” – Rich Ratcliff, OpinionRoute

Numerous studies have shown that bad actors can easily blend into a dataset.7 They are familiar with typical quality checks and can exploit any weaknesses in the system. Moreover, advancements in AI tools have made it easier than ever for them to go undetected.

While there are external factors in the ecosystem that researchers cannot control, they do have significant influence over many aspects of the research process that can help mitigate poor data quality through careful planning. Being intentional and proactive from the outset is crucial, as it is far easier to prevent data quality issues than to rectify them after the fact.

A comprehensive approach to data quality involves multiple layers of protection, each addressing a distinct threat. Our best chance of preventing poor data quality lies in applying all these layers collectively, from design (e.g., selecting a reputable sample source, employing robust fraud detection software) through fielding (e.g., incorporating rigorous screening questions and attention checks) and analysis (e.g., reviewing the data to ensure consistency and coherence). While reviewing verbatim responses should be part of this process, they have become increasingly less reliable for assessing quality. 

“With simple prompt adjustments, AI-generated answers can be shorter, less formal and include intentional spelling mistakes, making them no longer easy to identify.” – Florian Kögl, ReDem

Researchers know their data best. While many quality control measures can (and should) be programmed directly into the survey, data cleaning cannot be fully automated. 

Myth 5: Marketing materials accurately reflect the supplier’s capabilities such as quality, panel size and profiling information.

The online sampling ecosystem is rife with marketing materials that promise large, high-quality panels at cheap prices. In an attempt to differentiate themselves, suppliers often make exaggerated claims that can mislead buyers. 

Suppliers often overstate the size of their panels. A common misconception is that panel providers have large, readily available pools of highly engaged respondents. While some panels may boast millions of registered members, the active pool is typically a small fraction of this number, ranging from 5% to 10%. 

“The vast majority of all respondents registered on online panels are inactive (i.e., are not regularly taking part in research) and the size of the pool you can actually recruit into your study is often as much as 5-10x lower than the number the panel will advertise.” – Andrew Gordon, Prolific

Moreover, when suppliers advertise access to millions of people in their promotional materials (including responses to ESOMAR 37 questions), this figure often includes multiple sources, not just their proprietary panel. While this expands their reach, it doesn’t guarantee a reliable and consistent supply of high-quality respondents.

  • Panelist profiles may not be accurate. Another common misconception is that panel providers maintain extensive profiling data on their panelists. “Panel companies advertise hundreds or even thousands of pre-profiled data points on their panelists. However, this total includes many data points with low opt-in rates (around 1% of the panel), those that are not updated frequently enough, some that contain errors (e.g., a panelist may have mis-clicked their gender), and others based on leading questions.” – Benjamin Elliott, Sr. Research Strategist. While some providers do gather demographic and psychographic information, the quality and consistency of this data can vary significantly. Many panelists do not complete profiling surveys, turnover rates are high and panelists’ information can change over time.
  • Quality claims often lack rigor and transparency. When sample companies produce white papers comparing their panels to others, the findings always favor their own offerings, often with limited transparency around methodology. Buyers should approach such marketing materials with a healthy dose of skepticism and seek out panel-agnostic research for a more objective evaluation.
  • Quality pledges are not effective in advancing quality efforts due to the lack of enforcement. Quality pledges (i.e., formal commitments by a company or organization to uphold specific data quality standards, transparency and ethical research practices) are often seen as a positive step toward improving standards and fostering trust in the industry. However, the lack of enforcement and accountability means that many companies may sign these pledges more as a marketing tool than a true commitment to quality.

Threaten the foundation

Over the past 20 years, the landscape of online sampling has shifted drastically, with profitability often prioritized over ethical standards. The scale of data quality issues in online sampling has grown so significant that they now threaten the very foundation of research integrity. Only through transparency can we address and fix these deep-rooted problems. 

Transparency will lead to eye-opening realizations around the extent that demand is exceeding supply. It will also highlight areas in the current sample ecosystem that need further investigation by our industry to fully understand the implications and impact (e.g., respondents attempting an average of 21+ surveys per day, panels aggregating other panels who are aggregating other panels, etc.). 

The existing online systems and processes that produce ready respondents for survey participation were mostly designed by technologists and business interests who were seeking efficiency and to quickly maximize their profitability. The rampant re-use of sample is one of the most salient “efficiencies” in place today and researchers have yet to understand the repercussions on accurate findings. This is extremely difficult to see in ad hoc studies but becomes very obvious when trying to have consistent, sensible results across a tracking study. In fact, many companies have given up on trackers for this reason.

Unfortunately, sample and data quality were an afterthought as the industry focused their efforts on “faster and cheaper” findings. With true transparency, revelations will push the entire supply chain to reinvent itself and transform the current business model.

Join the movement

The blinders are finally coming off for client-side researchers when it comes to the state of online research and sampling but what can they do? Should researchers sit back and rely on the supply chain to correct the issues that suppliers created for their own benefit? The tide is turning and now is the time to come together and drive change: 

  • Unite and speak with one voice. Brands can come together as one voice, driving industry-wide and global initiatives that promote transparency and accountability. 
  • Proactively ask sample suppliers to provide quality metrics. For each study, brands can demand transparency on the source of the sample, amount and type of targeted sample, fraud rejection rates and reasons for terms. In addition, they should request that any fraud system information be passed to their dataset (including the type of device used to take the survey as well as device frequency, if available). These metrics will help researchers assess the true quality of the sample they are purchasing. 
  • Track sample supplier performance across all studies over time. Brands can monitor these metrics over time and include any in-house cleaning statistics. This will allow researchers to evaluate each supplier systematically and determine whether quality is improving or getting worse in their studies. 
  • Help build industry benchmarks. Brands can contribute this data to benchmarking efforts like the Global Data Quality Initiative or collaborate with other brands to publish a regular report on industry fraud levels. This collective approach will help us understand, as an industry, whether fraud is becoming more or less prevalent based on real data. 
  • Ask for evidence-based research from the industry and suppliers. Don’t accept vague promises – insist on clear, data-backed proof of sample quality and a robust, transparent methodology. 

By focusing on the real issues, we can make progress. Many client-side researchers are already coming together through CASE4Quality to ensure their voices are heard. Through collaboration, we can help create a more accountable and transparent research ecosystem. Learn more at www.case4quality.com

References
1 Di Gaspero, M. (2024, September). “How many surveys is too many?” LinkedIn post.
2 CASE4Quality. (2021). 2021 Online Sample Fraud Study.
3 EMI Research Solutions (2024). The Sample Landscape: 2024 Edition.
4 Di Gaspero, M. (2024, October). “How often do you wonder where your sample is coming from?” LinkedIn post.
5 Kantar (2024). “Is panel fraud the new ad fraud? The shocking issue affecting market research solved with AI.”
6 Hill, C., and Cox, D. (2024, February 14). “Unveiling the market research game-changer: The disturbing tactics of ‘Paid For Your Say.’” Greenbook.
7 Snell, S., and Krishnan, V. (2024). “Data too good to be true? Detecting and fighting modern fraud.” Greenbook Insights Tech Showcase 2024.