Editor's note: Stephen J. Hellebusch is a semi-retired consultant at HellRC. He can be reached at steve@hellrc.com.

Recently, I wondered again about how many people are spoofing us when they fill out an online questionnaire. Coincidentally, an article dealing with data quality appeared in the May/June issue of Quirk’s. “A threat, not a crisis,” by Mike Booth of InspiredHealth, describes what may occur if bogus respondents slip into our research. He describes in some detail the four categories of quality control that can be applied: device-and-browser-specific signals; survey-specific behavior; content of survey responses; and the broader survey-taking environment. Hopefully, most reputable interviewing companies employ these controls.

Among the survey-specific behavior Booth mentions are attention traps, including providing a setup that could lead to contradictory answers. It is that type of issue with which this short article is concerned.

Years ago, as a marketing research vendor, I had a client who was willing to indulge me in asking respondents’ exact age both at the start and at the end of a 15-20-minute survey. In those early days of internet interviewing, 6% of approximately 350 people changed their exact ages from the start to the end of the interview. That seemed very high to me, so I ran the crosstab of exact age pre (asked as the first or second question) by exact age post (asked as the very last question). Almost everyone fell close to the diagonal. 

There were 4% who gave ages the second time that were within a year of the age they gave the first time. But 2% changed age drastically, going from, for example, 35 years old to 65. It was concluded that those 2% of respondents were just “messing with” those of us conducting the interview. Given what we know now of those early days, this may have been an underestimate. But, as an aid in interpreting most survey results, it was acceptable; it just needed to be factored into the conclusions.

Here we will summarize the results of a second study conducted this year to see if things had changed with online interviewing.

Objective

The objective is to see what percentage of those completing an online questionnaire will claim different ages at the start and at the end of a 12-minute self-administered online interview.

Research design

Sample

The sample was 500 people from a large national database of consumers, with many controls to assure quality; interviewing was balanced to match the U.S. population on age and sex. It occurred in mid-January 2024.

Method

In the online interview, in order to match the U.S. population, the first question, which asked each respondent age, used categories.

A question asking for age representation.

The second one, at the very end of the interview, asked exact age.

A question asking a survey respondent how old they were on their last birthday.

Results

Seven people gave exact ages that fell outside of their age categories. That amounts to 1.4% of the 500, very similar to the 2% found in the prior test. 

Two people 20-29 were 19. Two people 30-39 were 40-49. One person 40-49 was 50-59. No one 60-69 was different than 60-69. No one 70+ claimed to be younger than 70. 

Conclusion

The limitation in this research is that the first question used age categories rather than exact age. Given this, the 1.4% may be an overestimate of the number of spoofers, assuming one is willing to discount those whose second-question age is within a year of the originally claimed age.

It seems that when the key differences in survey results are greater than 8%-10% and statistically significant at the planned confidence level, this amount of “noise” in the system can be ignored. If the difference is slight, but still considered statistically significant, one might prefer to explore further, or at least not rely heavily on the conclusions. Of course, this thinking depends a great deal on the base size of the subgroups being examined.