Editor's note: Jerry Thomas is president and chief executive of Arlington, Texas-based research firm Decision Analyst Inc. 

The world is awash in data from surveys of all types. The rise of low-cost, do-it-yourself survey tools has added to the flood of survey data. We can scarcely buy a toothpick without a follow-on survey to measure how happy we are with it. 

All of these surveys and the data they generate – often based on relatively large samples – tend to create a false sense of accuracy, based on the calculated standard error.

The standard error is a widely accepted measure of sampling error and it is typically the basis for the “accuracy of this survey is 5 percentage points, plus or minus, at a 95 percent level of confidence” footnote in research reports or survey results in newspapers, magazines or Web sites. The standard error is the basis for significance testing. The standard error assumes that:

  • A sample is chosen by purely random methods from among all members of the target universe.
  • All potential respondents do, in fact, respond to and participate in the survey (i.e., no response bias).
  • The results of many identical surveys of the target universe are normally distributed – i.e., the famous bell curve.

If these basic assumptions are met, and they rarely are, the standard error gives us a reasonably accurate measure of the sampling error in our survey data. However, the sampling error is only the tip of the iceberg. Many other types of survey error lie in wait for the innocent and the inexperienced.

Universe definition error

If a mistake is made in defining the universe for a survey, the results can be very inaccurate. For example, if the universe for a liquor survey is defined as males aged 21 to 29, based on a belief that young adult males account for the bulk of liquor sales, the survey results could be completely misleading. The truth is that people aged 50-plus account for a large share of liquor sales. This is a good example of a potential universe definition error.

Another example of a universe definition error is what happens in a typical customer satisfaction tracking survey. People who are unhappy with a company’s services or products stop buying them, so they are no longer customers. The company’s executives are happy, because satisfaction survey results are gradually getting better month by month, as unhappy customers drop out of the sample by becoming non-customers.

It is not uncommon for a company with declining customer counts to see its customer satisfaction ratings going up. The universe is changing over time, as unhappy customers leave. Universe definition errors, in most cases, will change survey results by significant amounts, usually much larger than the sampling error.

Sample screening error

Sample screening errors are similar to universe definition errors in that we end up with a group that’s not representative of the target universe. Most surveys consist of two parts: a screener to filter out consumers who don’t qualify for the survey and a questionnaire for those who do qualify. Sample screening errors are common and they can introduce large errors into survey results.

For example, let’s suppose a company wanted to survey people likely to buy a new refrigerator in the next three years. Such a survey might screen out consumers who purchased a new refrigerator in the past three years on the assumption that these individuals were out of the market. Nothing could be further from the truth. That past-three-year refrigerator buyer might decide to buy a second refrigerator for the home, or they might buy a second home that needs a new refrigerator, or they might buy a refrigerator for their adult son or daughter.

That buyer might also have a growing family that demands the purchase of a new, larger refrigerator or one family member might just get tired of the old refrigerator and want a new one. Survey results based on a final sample that excludes these past buyers of refrigerators would produce inaccurate results.

Non-response error

It’s always possible that the people who do not respond to a survey are somehow different from those who decide to answer the survey. The U.S. government is especially concerned about non-response error and often requires its research agencies to make a large number of attempts to complete a survey before giving up on a potential respondent.

With the move to online surveys over the past 20 years and the increasing demands for instant survey results, the risks of non-response error are greater now than in the past. Ideally, the invitations to an online survey would be spread out over a seven-day period (to include a weekend) and would be e-mailed at a slow pace, with reminders on the third day to all those who had not responded. This process would tend to minimize non-response error but it does not eliminate it completely.

Agenda error

A questionnaire is a set of questions. These questions create an agenda for the survey and create an agenda in the survey-taker’s mind. The subjects or topics of the questions can influence the results of the survey. If the first question asks about the dangers of knives in the kitchen, then a danger agenda is created in the survey-taker’s mind. Perhaps that individual never thinks much about the dangers of knives but we’ve now planted an expanded awareness of knife-related dangers and this might color the results of all the following survey questions.

Question-wording error

There are many types of question-wording errors and these can be large sources of error in final survey results. As an example, suppose we asked the following question:

“Do you agree with leading scientists, college professors and medical doctors that marijuana should be legalized in the U.S.? (choose one answer)

-- Yes, I agree marijuana should be legalized.

-- No, I do not agree that marijuana should be legalized.

The above question is obviously leading and biased. A more neutral wording of the question might be:

“Do you think marijuana should be legalized for adults in the U.S., or do you think marijuana should not be legalized for adults in the U.S.?” (choose one answer):

-- Marijuana should be legalized for adults in the U.S.

-- Marijuana should not be legalized for adults in the U.S.

As you might guess, the second question will give us a much more accurate measure of public attitudes toward the legalization of marijuana. Poor composition of questions – that is, wording error – is one of the greatest sources of survey error.

Answer choice error

The vast majority of questions are closed-end; that is, respondents are presented with a set of predetermined answer choices. If you have not conducted some really good qualitative research before designing the questionnaire, you are at risk of coming up with answer choices that might not capture all of the answer possibilities.

Do you include a “don’t know” answer option? This can dramatically change the results to almost any question. Sometimes, knowing that 45 percent of your customers don’t know the price they paid for your product is important information. When crafting survey answers, it’s easy to leave out answer possibilities. You get results and they look reasonable but you did not include the most important answer choice, so all of your results are meaningless.

Pre-testing or pilot-testing every new questionnaire is absolutely essential to prevent answer choice errors. Incomplete answer choices are a major source of survey error.

Transition error

If you are composing a questionnaire on the subject of peanut butter and you suddenly switch to the topic of potato chips, respondents may overlook the change. It’s not that respondents are not paying attention; they are thinking about peanut butter and concentrating on peanut butter, so everything starts looking like peanut butter and they completely miss the change in topic to potato chips.

Transition error can also creep into rating scales. If a rating scale suddenly changes during a survey, the respondent might not recognize the change. For example, if a rating scale goes from “excellent, good, fair, poor” to “poor, fair, good, excellent” in the middle of a survey, even the most alert participant might not notice the change. Using many different types of answer scales within a questionnaire can lead to the same type of error. The respondent accidently misreads the answer choices as the questionnaire bounces around from one scale to another.

These sequencing and transition errors can be a significant source of survey error. It’s especially a problem in omnibus surveys that contain many blocks of questions on different topics. 

Order error

If answer choices are not rotated or randomized, survey error is the result. For example, in a paired-comparison product test, the product tasted first tends to be preferred by 55 percent of respondents (give or take), even if the two products are identical.

If two new product concepts are compared, the one shown first (all things being equal) will be preferred over the one shown second. If a question is followed by a long list of brands or a long list of possible answers, the brands toward the top of the list will be chosen more often, all other factors being equal. That means it’s important to rotate or randomize the answer choices to prevent order error.

Of course, not all answer choices are rotated or randomized, because some questions don’t work if the answer choices are out of order (e.g., a purchase intent scale or a ratings scale). Order error is typically not a huge source of error in most surveys but in some instances it can be of major significance.

Assumed knowledge error

Survey creators often assume that survey participants possess more knowledge about a topic than is the case and assume that survey respondents are familiar with the language and terms used in the survey – when often they aren’t. Frequently, survey participants simply don’t understand the words and terms in the questions or they don’t understand the answer choices. People will almost always give an answer, even if they have to guess. Assumed knowledge error is a potential problem in both consumer and business-to-business surveys.

Tabulation error

Every company that tabulates survey answers makes assumptions and sets operating procedures that affect the reported results from a survey. For example, in calculating an average for grouped data, there is latitude for differences or error. In calculating average household income, as an example, how does one count the “under $25,000” income group? Do you count an “under $25,000” answer as $25,000? Or do you count it as $12,500? Or as $18,750? What about household income greater than $250,000? Do you count that household’s income in computing an average as $250,000 or $300,000 or what? If a question asks how many times people have gone swimming in the past year, do you calculate an average number of times or do you look only at the median? The base chosen to calculate a given percentage or average can change the results dramatically. These tabulation decisions can be a source of major differences or error. 

Social desirability or social pressure error

We humans are highly emotional, social creatures. We want others to like and admire us. This leads to something called social desirability bias or social pressure error, particularly in surveys conducted by a human interviewer (face to face and/or by telephone). The respondent gives answers intended to make the interviewer think better of the respondent.

For example, the respondent might say he has a master’s degree when he only has a bachelor’s degree; or the respondent might say that he goes to church every Sunday, when in truth he only goes once every six months. These social desirability biases are muted in online surveys or mail surveys but they never disappear completely.

Translation error

If the same survey is conducted in the U.S., France, Germany and China, the translation of English into the other languages introduces differences (error) into survey results. That is, the results in English will be different from the results in French, German and Chinese, purely because of language differences. Even if you are working with a highly skilled translator with marketing research experience, the questionnaires across different languages will never be the same.

Cultural error

On top of the differences in language from one country to the next, cultures tend to be different. Some cultures are lively and festive, others drab and dull. Some cultures are happy and positive, while others are serious and dour. Some cultures like to give positive, glowing answers, while other cultures tend toward a negative worldview. These cultural differences lead to differences in survey results. We can think of culture as another source of bias or error.

Overstatement error

If you ask consumers how many cans of pinto beans they purchased in the past year or month, they will overstate the actual number of cans purchased by a factor of two or three to one. For high-priced products, the overstatement factor might be four or five to one or even higher. If the researcher accepts these reported purchase numbers at face value, the survey results will overstate the true numbers by huge margins. Likewise, if a consumer is asked how likely she is to buy a new peanut butter, she will overstate her likelihood to purchase by a factor of at least two or three to one. If the manufacturer bases sales projections on these inflated purchase intentions, far too many jars of peanut butter would be produced and shipped. Overstatement error is huge for certain types of questions and is another major source of error in survey results. 

Interpretation error

Overstatement error can lead to interpretation error. Let’s suppose you have completed that new product concept survey and are ready to write the report – and impress your boss. Thirty-two percent of respondents said they would “definitely buy” the new product and 23 percent said they would “probably buy.” Let’s see, there are about 120 million households in the U.S., so 55 percent (32 percent plus 23 percent) means that 66 million households might buy this new product. And the respondents said they would buy the new product six times a year and its price is $9.95. So 66 million times $9.95 times six equals market potential of just under $4 billion dollars. Wow! Your boss is going to be so happy when she hears the results. Your ascension to the corporate throne is only a matter of time.

This tongue-in-cheek example illustrates the kinds of interpretation errors that human judgment can introduce into survey results. Yes, you used the results from the survey exactly as printed out in the crosstabs. The numbers are correct. But your $4 billion market potential might only be $200 million once an experienced researcher discounts the survey results and adjusts for planned advertising spending, awareness, distribution and competitive response. The interpretation of survey results is often a major source of error.

Unconscious error

If you are emotionally involved in a corporate project, say, the development of an exciting new product, you might just fall in love with the innovation that you are bringing to the world. When the higher-ups want some evidence to support your unbridled enthusiasm, you design a survey to provide the answers. Your enthusiasm, your emotional involvement and your love of the new product causes you, without conscious awareness, to slant the definition of the sample and the wording of questions to provide the affirmation you so badly want. This is a major source of survey error, especially when corporate surveys are conducted directly by corporate employees.

Crosstabulations and significance testing

Many researchers demand that crosstabulation tables be cluttered with significance tests based on the standard error (i.e., sampling error) or they spend considerable time running tests to determine if survey results are statistically significant, again based on the standard error. In the grand scheme of marketing research, sampling error is typically a minor source of survey error, compared to all of the other sources of error, yet it consumes 100 percent of the typical researcher’s attention and time. It might be wise to skip the significance testing in crosstabs and charts and instead focus attention on the other sources of potential error – where risks and degree of error are much greater than sampling error.

Minimize non-sampling errors

Significance testing based on the standard error is vastly overrated in importance and it might actually cause us to overlook what’s really important in research design, questionnaire design and survey interpretation. Instead, marketing directors should focus on minimizing non-sampling errors to dramatically improve the validity, reliability and accuracy of their survey research.

In addition, researchers should always pre-test or pilot-test a totally new questionnaire design. They should recuse themselves from a survey if they are emotionally involved in the subject of the survey or best friends with the brand manager.

Researchers should also check their survey results against previous surveys and against secondary data to make sure the survey results are reasonable and within acceptable ranges. Those involved in research should always be skeptical. If something seems amiss or out of kilter, they should keep searching for the explanation or the source of the error. Ultimate truth is elusive and shy and it’s our job to coax it out of the shadows.