By The Numbers: Internet data quality

By Randy Brooks | January 1, 2008

Reading time: 5 minutes

Abstract

A veteran researcher discusses the efforts undertaken by his research firm and others to ensure the quality of data gathered online.

Research Topics:: Online Research | Quantitative Research | Respondent Cooperation/Satisfaction | The Business of Research
Content Type: By the Numbers | Magazine Article

Share Print

Editor’s note: Randy Brooks is founder of Directions Research, Cincinnati.

Last year I turned 60, celebrated my 33rd year in the marketing research profession and my 20th year as head of Directions Research. I think that entitles me to offer an opinion about Internet data quality issues that the research community has been hotly debating. My opinion will, in the tradition of our profession, be liberally supported with data, but I forewarn you that what I am about to say will be counter in many ways to the mainstream and much of the debate we have been having for the last couple of years.

But first, a story. In 1976 Burke Marketing Research sent me to London, where I was charged with establishing a centralized telephone interviewing facility, the first of its kind in London and perhaps the first of its kind in Western Europe. I rented space in the World Trade Center that overlooked the Tower of London, hired and trained a staff and set out to persuade the British research community that this method of data collection had merit. At that time in the U.K. virtually all data was collected using door-to-door interviewing. Despite the manifold weaknesses of this method (the virtual absence of upper- or lower-income respondents; problematic coverage of rural areas; impossible to conduct interviews in large apartment complexes; few if any minorities in the sample, etc.), the data derived via this method was regarded as the Revealed Truth.

The apparent weaknesses of telephone interviewing were magnified and examined in great detail. It was fascinating to see objective professionals label telephone penetration rates of 75 percent as “inadequate” while ignoring the obvious bias and control issues that characterized door-to-door data collection. The reason for their actions, I believe, was simple: door-to-door interviewing had been used since the beginning of time in the U.K. and it worked!

The data were sensitive to in-market events. The data were predictive of consumer reactions to product, advertising campaign and pricing initiatives. Because the method had always “worked” the details of its shortcomings were ignored. Telephone interviewing had yet to provably work and thus was suspect. Every apparent flaw was speculated about and used to dismiss it. Only when some brave souls tried it and found it worked (generally better than door-to-door) did it gain acceptance.

For the last several months the industry has worried about a handful of reports of flawed Internet studies that didn’t work. We fret endlessly about respondents who are “gaming” the system to earn money. We self-flagellate about poor quality. And yet Internet data collection is now the largest single method in use in the United States. Our clients at Directions and apparently thousands of clients saw fit in 2006 to allocate over $1 billion of corporate funds to purchase data gathered on the Internet.

Why? Well it must be “working” - we are collectively too smart and risk-averse to wager our careers and reputations on a method that is hopelessly flawed.

Let’s take a look at a sampling of the data we have at Directions on how online research is “working”:

We run an Internet-based concept testing system for a very large client. Prior to standardizing with us, the client selected 70 concepts it had tested in the past using other suppliers/methods and retested those concepts in our new system. The correlation between old and new scores was better than 85 percent! Test-retest reliability is a cornerstone of confidence in any system and we witnessed it.
We do tracking work for many clients. When advertising spending goes up, key measures rise; turn off the advertising and measures trend down. The numbers hang together and tell logical stories. The data seem valid compared to in-market events.
We use mixed methods for some tracking work because the Internet does not adequately cover minorities. The correlation in key measures between phone and Internet is extremely high.
Massive segmentation studies (n = 10,000, for example) in the restaurant industry profile occasion-based segments with sensible competitive sets and believable shares, advantages and conclusions.
Brand shares calculated from claimed usage on Internet studies invariably match up very well with syndicated data.

Day in and day out the data we gather on the Internet is subjected to the duck test. It quacks, waddles and has webbed feet. Internet data collection is a powerful and useful tool that has enhanced the ability of marketing researchers everywhere to aid management in making decisions. My conclusions are drawn from work we do with an array of Internet data collection firms who utilize a variety of alternative sampling methods and incentive plans. They all work.

Learned and adapted

Like our colleagues and competitors, we have learned and adapted our protocols as the Internet has sped from introduction to dominance.

We use finely-tuned age and sex quotas to insure that each sample is representative of the population to which we are required to project our results.
We avoid the temptation to use the less costly Internet platform if the respondents being sought are elderly, minorities or very-low-income respondents. We blend methods when these respondents are needed.
We’ve learned to identify speeders and eliminate them from the ending sample.
We’ve built logic traps to clean out the cheaters.
We’ve worked with our suppliers to clean the chronic cheats from their samples.
We watch the data we get here carefully because our very existence depends on it.

• We use Internet data collection only when it is the best solution. Phone, mall, in-store, interactive voice response, mail and other methods are still viable and in many cases provide a better solution than Internet.

Few cheats

You want a positive ending to this story? Here it is: very few people are cheats. Just ask the IRS, which relies on mass honesty in tax reporting. There were 1,283,950 individual returns examined in 2006. Of those, 3,907 investigations were made by the criminal investigation program, for legal source tax crimes (1,524 for cheating) or illegal source tax crimes (2,383 illegal activities).

Only about .3 percent of audited returns are fraudulent, but of those .11 percent are really related to cheating.

Very few interviews we obtain from reputable Internet data collection firms are tossed out, and it should be noted that we are never completely sure these respondents are flawed - it just seems likely that they might be.

Remember also the interaction we have with respondents in an Internet study is a one-on-one relationship. Individuals agree to a self-completion study. If a cheater shows up only his/her data is flawed. Phone studies and mall intercepts create opportunities for individual interviewers or firms to damage a very high percentage of the data. My most embarrassing professional experiences have been the result of inept or fraudulent work by those rare individuals or firms who cut corners to make deadline.

Trust but verify

Ronald Reagan used this phrase “trust but verify” and I think that principle applies in this debate. Trust that professional organizations have nothing to gain in the long run by offering up flawed data but certainly verify and adapt processes to insure the data are accurate and the conclusions we are drawing can be used with confidence by decision makers.