Are bad respondents always bad?
Editor's note: Nallan Suresh is senior director of analytics at San Francisco research firm MarketTools.
Data quality is the principal concern of any market researcher. Assurance to the researcher that all feasible measures are being taken to ensure the quality of data should, without a doubt, be the primary objective of the provider of the data.
The two main factors that can deteriorate data are the quality of the respondents and the quality of the surveys. While there are factions in the research community that lay the responsibility more in one direction or the other, the reality is probably a middle ground where the problem must be approached from both sides. A 2009 study by the Advertising Research Foundation concluded that “both survey length and respondent speeding are threats to data quality.”
In an earlier study, our firm, MarketTools, presented results from research that examined the impact of survey design on the quality of the data. In this article, we look at the problem from the other angle, namely the quality of the respondent. Specifically, we are looking at the engagement level of the respondent to determine whether they are “bad.”
In addition, we would like to draw a distinction at this point between occasionally unengaged respondents and chronically unengaged ones (i.e., those who are habitually unengaged). It is the latter that we would term as bad and would want to exclude from future surveys. The former (i.e., the occasionally unengaged respondent), can be dealt with at the survey level and we feel they are that way more due to survey design considerations than their nature. (Note: Another aspect to the quality of the respondent pertains to whether they are real or fake - those panelists were already shown to provide data of poor quality in earlier research and have been excluded from this study through our proprietary methodology.)
The questions that we looked to answer in the research study are: Are bad respondents always bad? Are there chronically unengaged respondents? Do they behave that way regardless of survey design? Is the quality of data from these respondents poor? What is their demographic breakdown? Are they younger or older? As mentioned before, the answers to these questions are of paramount importance to the quality of data obtained in market research studies.
The objective of this study was to understand the impact of unengaged respondents on the data and also to understand the nature of these respondents. We mainly wished to see, if the respondent stays unengaged on a great proportion of their surveys, is it sufficient to identify and exclude them as being unengaged on a survey-by-survey basis to preserve data quality or is there a need to exclude them for all future surveys, beyond the obvious cost savings due to wasted incentives?
The study had two phases. In the first phase, which was conducted at a survey level, we looked at thousands of interviews to evaluate the impact on data quality from the responses provided by unengaged or bad respondents when compared to those provided by the engaged respondents. This phase was also a validation of the need to exclude the bad respondents as a way to preserve the quality of research data. In the second phase, we tracked unengaged respondents over several surveys to observe their behavior across these surveys and addressed the question on the nature of these respondents and the impact they may have on data quality.
Undesirable characteristics
A respondent who is engaged is one who thoughtfully considers all of the questions of a survey and whose responses reflect their opinions. Respondents who are not engaged tend to exhibit undesirable characteristics such as speeding through a survey, straightlining the answers in matrix questions and providing inconsistent or random responses. For the purpose of this study, an unengaged or a bad respondent is one who speeds or straightlines through a great proportion of the pages in the survey. Granted, there can always be respondents who work their way slowly through a survey and still provide inconsistent responses but that is generally expected to be an insignificant proportion of the unengaged as it is ostensibly at odds with the motivation of these respondents to earn incentives while spending as little time as possible on a survey.
We also recognize that there are other methods to capture unengaged respondents, such as the use of trap questions, but since survey designs vary significantly it is difficult to standardize such methods, which are intrinsically dependent on the wording of the questions themselves. We therefore expect the data-driven methodology used in this research, which is simple to implement and broad-based enough to be standardized easily, to capture a significant proportion of the unengaged across all types of surveys.
Speeding on a page is identified by examining the time spent by the respondent on a page and is evaluated on a page-by-page basis. Since each page in a survey can be different from others in terms of the complexity of the page and, correspondingly, the amount of time typically spent on the page, we consider each page separately. The typical time for a page is obtained through data-driven means and is computed by factoring in the times that all respondents in the survey spend on the page.
Respondents who spend statistically far less time than the norm for the page are tagged as speeders for the page. Straightlining is only defined for pages with matrix questions. A person who straightlines on a page is one who provides the same choice of answers for all parts of a matrix question, while also speeding on the page, where speeding is as defined above. To be classified as unengaged for the survey, the respondent must speed, or straightline while speeding, on more than a certain percentage of the pages. The threshold, of the percentage of pages, set for the method based on a detailed study of histograms based on thousands of surveys, was 40 percent for speeding and 25 percent for straightlining.
In a typical survey, anywhere from 2-5 percent of respondents are classified as unengaged. In the next section, we will discuss the impact on data quality by these respondents.
Phase I: Impact on data quality by unengaged respondents
For the purpose of this analysis, MarketTools examined the results of over 35,000 interviews across several surveys spanning many product/service categories, including tracker surveys that spanned several months and conventional stand-alone surveys. The analysis involved the comparison of the results from respondents that were set aside by MarketTools’ TrueSample system as unengaged, against those of the valid respondents, across the various surveys.
To normalize the results, we looked at question types that were common across survey categories, such as intent-to-buy questions, attitudinal questions that were general or product related and survey-rating questions, etc. We collectively examined answers from these groupings of similar types of questions across the surveys – this grouping across common questions was necessary to obtain sufficient data for statistical robustness, as each survey only provided a small number of unengaged respondents. The analysis focused on whether the unengaged respondents overwhelmingly chose any particular type of answer choice over the other, whether they were unusually positive in their responses and whether they carefully thought out their responses or were unusually non-committal.
Our findings are discussed below.
Unengaged respondents tend to be younger males. Right off the bat, when we examined the demographic balancing in the data (Figure 1), we found that the unengaged respondents tended to be more skewed toward the younger population, especially younger males. We did not notice any significant regional bias.
Therefore, for the purpose of the comparative analyses that follow, we resampled and rebalanced the engaged population to be similar in age and gender breakdown to the unengaged respondent population, as can be seen in Figure 2.
In terms of responses to questions, and consequently, quality of data, unengaged respondents tended to engage in both weak and strong satisficing depending on the type of questions asked.
Unengaged respondents tend to show overly positive intent to buy. Unengaged respondents were more likely to choose “definitely would buy” as an option (the highest in a five-point scale) regardless of the product or survey. This overwhelmingly positive attitude is reflected in Figure 3, where the percent of the respondents that select “5” as an option tends to lie almost always over the 45-degree line across all the intent-to-buy questions that we looked at.
Unengaged respondents tend to rate survey quality higher. In a similar vein, unengaged respondents also tend to rate the surveys that they take significantly higher than do the engaged respondents, as seen in Figure 3. While this does not necessarily affect the quality of the data directly, it does point to the satisficing attitude of these respondents while taking the survey. Also as an indirect impact, such responses can also tend to artificially raise the overall rating of the survey, which may lead to researchers having an overly positive view of their survey design and reduce the likelihood of altering the design when it may in fact be necessary.
Unengaged respondents are more prone to attitudinal indifference. On questions where attitudes toward the product are solicited, the unengaged respondents overwhelmingly selected the middle-of-the-road option. On a question with five choices, this translated to Choice No. 3 (e.g., Neither Agree nor Disagree), while on a seven-choice question, the odds-on choice for these respondents was Choice No. 4. Figure 4 shows the percentage of unengaged respondents that chose the middle choice, compared to the percentage of engaged respondents that chose the middle choice on the same questions across all the surveys.
Unengaged respondents are sometimes split between satisficing and indifference. For some surveys, the in-survey experience may cause the some of the unengaged respondents to be more positive on the attitudinal questions, while other unengaged respondents might show indifference. In such surveys, their responses, while still different from those of the engaged respondents, tend to split between the middle-of-the-road and highest rating on the questions forming two such clusters of responses. When we look at the percentage of unengaged respondents that select either the middle choice (e.g., 3) or the most positive choice (e.g., 5), sum up the two percentages and compare this number to the same for the engaged respondents, the differences again are quite significant (Figure 4). A greater proportion of the unengaged panelists tend to pick either of these choices than the engaged respondents.
This indicates that even for the vast majority of the unengaged respondents, the nature of the survey may have an impact on their response patterns. Therefore, while they will remain unengaged and display this attitude in their response, their mode of displaying this apathy can manifest itself differently depending on the survey.
Younger unengaged respondents are different even when compared to younger engaged respondents. Based on the results presented earlier, one may rightly ask if, when flagging respondents for engagement, younger respondents should be compared against a benchmark set by their own age group or against an overall benchmark. Since there is a bias toward younger age groups when looking at the distribution of unengaged respondents compared to engaged ones, comparing them against their own might be a little more forgiving toward these age groups which are already hard to reach, especially if, as a group, they complete surveys faster. However, this method should only be considered if, in fact, the data from the younger respondents who have been tagged by the current method based on the speeds of all panelists is closer or similar to the younger, engaged respondents. This would indicate that younger respondents are overall faster or straightline more than older respondents and the fact that ones we mark as unengaged are similar to the engaged reinforces the sentiment that we are being harsher on them.
In order to test this, we looked at the attitudinal data of the unengaged respondents under 35 years old and compared them to the engaged ones in the same age group. As can be seen in Figure 5, the respondents under 35 marked as unengaged by the current method are still substantially different from the engaged ones and overwhelmingly pick the middle choice. This is very similar (and in some respects, actually more biased) to the trend shown in Figure 4 for the entire respondent set.
This validates the current methodology of flagging respondents for lack of engagement using the entire demographic as a benchmark. For starters, fragmenting the data by demographic to obtain a statistical benchmark would make the benchmark less robust due to small volumes available for a large proportion of surveys. Secondly, it is likely that younger people are probably not that much faster in reading and comprehending text than older respondents, which would make a speeder/straightliner one regardless of their age.
Phase II: Impact of chronically unengaged respondents
In the first phase, we looked at the survey level to establish that unengaged respondents do provide poorer-quality data and that they should, in fact, be excluded from the results. But do these respondents speed on other surveys as well? And hence, will their data be compromised on other surveys where they are not excluded because they are under the radar? Are they more liable to be chronically unengaged, and if so, at what point would you call a respondent chronically unengaged?
To answer these questions, we looked at historical engagement data and tracked the behavior of respondents over the course of several surveys. In the following analysis, Phase II, we evaluated over 1,600 surveys and tracked over 20,000 unengaged respondents across these surveys that spanned 10 or more service/product categories and several months. Our findings are summarized below.
Chronically unengaged respondents impact data quality negatively. Why we would want to classify a respondent as chronically unengaged if, in fact, we can identify them as unengaged in every survey and exclude their responses? There are a couple of reasons. An obvious first consideration is that of incentives. A respondent tagged as unengaged is still given the incentives for completing the survey, while their responses are not used in the research. This is a direct cost to panel companies. The second reason is more subtle and has an impact on data quality. Whether a respondent is considered to be engaged is based on a threshold. If the respondent is on one side of the threshold they are considered to be engaged and if they are on the other side, they are considered to be unengaged. However, if a respondent is consistently just under the threshold on several surveys it should still raise a warning flag on their data quality.
To look at how respondents who were ever unengaged behave on surveys on which they are not caught speeding, we examined their characteristics on these surveys. In Figure 6, we plotted the percent of pages sped on surveys that they were not caught on, against the percent of surveys that they took and were found to be unengaged on. As can be seen in the results, respondents who sped on a greater percentage of their surveys also sped on an increasing percent of pages on surveys they were not identified as unengaged. This says that they were under the radar on those surveys and that just because they were not caught does not mean their data quality is not compromised. This argues for setting a threshold based on either a number of surveys or a percent of surveys on which respondents are found to be unengaged to classify a respondent as chronically unengaged.
Chronically unengaged respondents can be identified and excluded. The issue is to identify a threshold based on the data for the classification of chronic disengagement. While we could have arguably picked a threshold based on the above graph at some point on the x-axis, we decided to look at some additional behavioral data to see if we could identify a natural threshold. Figure 7 shows the dropout rate of the ever-unengaged respondents and the number of surveys taken, plotted against the percent of surveys on which respondents were unengaged. As can be seen, the dropout rate increases with percent of surveys unengaged up to about 20 percent or so, after which it starts flattening out and even drops a little. The increasing trend of dropout rate with percent surveys unengaged is consistent with earlier research and points to some level of survey-driven disengagement behavior. The decreasing survey count also points to a more casual survey taker who might be driven to the behavior by the design of the survey.
However, the trend after about the 20 percent mark is interesting. The number of surveys taken starts to rise a little and the dropout rate tends to flatten out and drop a little. This indicates that respondents who are more unengaged actually take more surveys, speed on more of them and complete more of them. This not only implies more completed surveys with poorer data quality but also points to a class of respondent that might be more driven by incentives while providing bad data. Based on this, it appears that a threshold of 22 percent or so may be used to differentiate between occasional and chronic disengagement. In the current data set, this resulted in 20 percent of the ever-unengaged respondents being classified as chronically unengaged.
Chronically unengaged respondents tend to be younger. Looking at the age breakdown of the chronically unengaged respondents compared to the occasionally unengaged ones (i.e., those on the right of the threshold in Figure 7 against those on the left), we can see in Figure 8 that the chronically unengaged respondents are more likely to be younger.
Overall, roughly 2 percent of survey takers in this study were considered to be chronically unengaged; 7 percent were occasionally unengaged and 91 percent were always engaged.
A different response characteristic
We have described a methodology to identify unengaged respondents and presented reasons as to why they should be excluded from survey results. The first phase of the analyses presented here indicates quite clearly that unengaged respondents do indeed have a very different response characteristic than the engaged ones. The responses provided by the unengaged respondents identified by the engagement methodology tend to show a combination of greater indifference on the thought-provoking questions, coupled with a greater-than-normal positive intention to purchase. All of these qualities will have a negative impact on the quality of the data and therefore the decision-making by the end-user.
The second phase of the analyses showed how unengaged respondents can be classified into occasionally unengaged and chronically unengaged. It is important to identify chronically unengaged respondents and exclude them, both from an overall incentive cost perspective as well from a data quality perspective. This is because chronically unengaged respondents can give poor data even on surveys where they are under the radar. The engagement criteria and methodology described in the study can help flag and set aside these respondents and can play a critical part in a multipronged approach to ensure the quality of market research data.