40 years, 40 lessons learned
Editor's note: Doug Berdie is president of Consumer Review Systems, a Minneapolis research firm.
Having conducted marketing research for 40+ years, many “truths” have become apparent to me. These lessons have wide applicability, having been learned across many different industries, many countries and cultures and across business-to-business and business-to-consumer settings. I’d like to share these insights in the hope they will save less-experienced practitioners from having to learn them the hard way.
General marketing research themes
1. Validity is the Achilles’ heel of marketing research. Predictive validity is especially weak – mostly because companies change their strategies and tactics so often it’s not possible to see whether early measurements actually predict later outcomes. My April 2016 Quirk’s article (“A better use of your time”) shows, for example, that customer satisfaction scores are not a valid way to predict financial outcomes. In addition, psychological studies of “influence” show that people often simply attribute their decisions to post facto rationales rather than to what really influenced their decision (often, because they do not even know what that was). We’ve also seen that people’s estimates of such things as how long they were on hold before reaching a call-center service representative or how long they were in a store before a salesperson greeted them have little relation to actual measures obtained from phone logs and videos. Are answers to these types of questions estimates of actual time or indications of the degree of frustration experienced? It’s hard to say. Marketing researchers must never blindly assume that what they are actually measuring is what they intend to measure.
2. Reliability can be maximized in marketing research. Using time-tested research methods does yield consistency in marketing research measures and that is one of the hallmarks of reliable data. Well-documented, repeated sampling designs; high response rates to surveys; consistent field observation procedures; survey questions and formats that measure the same thing (across people and across time); and consistency in how data are analyzed and reported all are attainable objectives and heighten the credibility and usefulness of marketing research data. Although research objectives change, retaining some core measures across time allows for longitudinal comparisons that generate more confident conclusions. Even with sound research designs, reliability will always be suboptimal.
3. Marketing research fads: here today, gone tomorrow. Management consultants make money by deriving new or repackaged research approaches. Hence, customer value analysis was popular for several years, yet is rarely seen these days. “SERVQUAL” was used extensively years ago for customer satisfaction research yet has largely disappeared. The more recent hot fad is relying strongly (or, in some cases, exclusively) on “recommend” questions in customer experience surveying (which has morphed from customer satisfaction surveying and customer loyalty surveying). As all these fads come and go (usually because they cannot demonstrate they validly predict anything of value), it’s wise to remember the marketing research industry has been built on the sound fundamentals of scientific research (representative samples, well-articulated questioning, rigorous and appropriate analysis and clear and insightful presentations and recommendations). Sticking to sound principles yields the best results.
4. The whole is greater than the sum of its parts. Some marketing researchers are skilled at operations, others in research design and analysis, others in sales and others in management. Places exist for all these skill sets. It is relatively rare to find people who are strong in a variety of these skills. Teams containing all these skill sets produce the best research.
Clarity regarding research objectives
5. Getting the best data is not always the primary objective. For example, companies with many retail outlets may be striving to change their customer service culture by instituting store-specific satisfaction surveys so each store can gauge how well it is doing and take remedial actions where needed. In these cases, maximizing store participation is the overall objective and to do that, alternative versions of a survey may be required. Although using different versions of a survey (e.g., some with five rating scale points and some with three scale points; some in English and some in Spanish, etc.) may be suboptimal from an analytical standpoint, ways exist to maximize the quality of even these data – while facilitating the overall objective of the research.
Overall research design
6. The customer is almost always right. Marketing research professionals are often upset when clients reject the proposed research method. However, when my clients have understood my recommendation and the rationale behind it and, nonetheless, rejected it, their rejection has usually turned out to be well-founded. Although my research acumen may exceed theirs, their knowledge of what will be well-received in their organization exceeds mine. “Good research” well-received, with the results used to make decisions, trumps “outstanding research” that sits on a shelf.
7. What works in Sweden may not work in Uganda. How people in different countries use ratings scales, view their jobs, etc., varies considerably. For example, people in Nordic countries are harder graders than those in Mediterranean countries. People in Asian countries are less likely to use the full set of options in a rating scale. A higher percentage of Europeans than Americans view a life-work balance as being important. So, if the same surveys are used across all these cultures and the analyses do not take account of these cultural differences, the results can be very misleading. Hence, when business decisions (and compensation issues) are tied to these data, extreme care must be taken.
Defining samples
8. The quality of client-provided sample lists always disappoints. It is often so poor as to offer little comfort regarding its representativeness. Purchased lists are not much better. It is, therefore, usually necessary to use great creativity to find ways to get representative samples.
9. Seek and ye shall find. It is not true that the people highest up in an organization always have the information you are seeking. For example, the belief that surveying plant managers will yield the best data about day-to-day manufacturing operations is misguided. Careful pre-survey work (e.g., a few phone interviews) will reveal who really has the information. Those should be the people surveyed.
Determining sample sizes and data precision
10. Budget, research objectives and mathematics determine the correct sample sizes. There is no single formula that is correct for all situations. So, do not merely use a “sample size” table in reference books that, in reality, only applies to one situation and is misleading for others. The correct sample size to select is a combination of how much data precision is required and available budget.
11. The type of question dictates the correct formula for obtaining precision estimates. A dichotomous question requires a different formula than a five-point rating scale and a question asking for a numeric value (e.g., one’s age) as a response requires yet a different formula. So, a survey (which will usually contain many types of questions) will require many formulas, each of which will yield its own precision estimate. To present accurate sampling precision estimates, one needs to present an estimate for each question.
12. Sampling precision numbers are, at best, very rough estimates. When survey results tout accuracy “within plus/minus 3.7 percent,” chuckle and take that with a huge grain of salt. The formulas that generate those numbers assume the sample data have come from truly random samples where 100-percent response rates were achieved – two criteria marketing researchers never come close to achieving. So, the farther away from a true random sample, and the lower the response rate, the sillier it is to present sampling precision estimates at all, especially with numbers including decimal points! Furthermore, let’s also be clear that the precision estimates from, say, a yes-or-no question are not “plus/minus X percent” but rather are “plus/minus X percentage points.” If, for example, 50 percent of a sample say “yes,” incorrectly stating “plus/minus 5 percent” (.50 x .05 = .025) results in a conclusion of “47.5% – 52.5% percent,” while correctly stating “plus/minus 5 percentage points” results in “45% - 55%.” It does make a difference.
13. Response variance greatly affects precision. Questions eliciting homogeneous responses provide greater precision (with the same number of respondents) than do questions eliciting responses with great variance. The bottom line here is there are many more variables than just “the number of completed surveys” that dictate precision estimates, so without presenting an estimate for each question, misleading interpretations result.
Nonresponse bias and response representativeness
14. The potential for nonresponse bias exists in all modes of surveying. This means online, phone, in-person and mail all need to address this potential problem. Using “replacement” samples does not solve the problem, it often only masks it – making it even more pernicious. The only way to reduce nonresponse bias to a “comfortable” degree is to get high response rates (of at least 50 to 60 percent).
15. Demographic representativeness does not guarantee representativeness of response to survey questions. Lack of interest, or negative (or positive) feeling may be what is driving the nonresponse.
16. Proper use of follow-up procedures can almost always stimulate acceptable response rates. Hence, leave plenty of time in the project schedule to implement them. For phone surveys, callbacks will be needed. Phone calls are also an effective way to follow-up with mail and online survey nonrespondents – in addition to additional mail and online contacts.
17. It is still possible to obtain exceptionally high response rates. In 2015, I obtained a mail survey response rate of 84 percent by using proven tactics of: a short questionnaire; clear, reasonable questions; and follow-ups. Don’t believe the naysayers who claim that high response rates can no longer be obtained.
18. It’s silly to expect high response rates when the topic being queried is of little interest to people. So, asking people to stop at a kiosk, for example, just as they clear customs in Vietnam (as I was asked to do) is unlikely to get a reasonable response rate.
Types of data collection
19. Online surveys share many characteristics with mail surveys. Both are self-complete surveys. Hence, there is much to learn from the history of mail surveys regarding question wording, etc., that should be examined and applied to online surveys. At present, many practitioners of online surveys are not aware of this wealth of empirical knowledge. They should be.
20. Panels will always be panels – and subject to their limitations. The lack of representativeness (participation in many cases being driven by incentives) and lack of attentiveness in responding are always concerns that need to be considered when relying on panels. Panels are not a substitute for true random samples.
21. Beware of big data. The underlying assumption of statistical analysis is that you need to state in advance what you are looking for and define acceptable levels of statistical error. Throwing all data into an analytical potpourri, spinning the wheel to see what comes out and accepting that as insight violates this key assumption and leads to lots of Type 1 statistical errors, merging of varying types of data, combining data of varying quality and confusing correlation with causation. It also leads to the erroneous application of group data to individuals. Big data may be useful for generating hypotheses and ideas for further analysis (that then lead to properly conducted analysis) but it’s not all it’s made out to be in terms of generating conclusive insights. Big data has its place but beware of claims that overstep what that place is.
Survey question-wording
22. Question stems should present all response options. Failure to do so biases responses. For example, the question stem, “Do you favor more, the same amount or less funding for road maintenance?” can lead to results that vary by as much as 28 percentage points from when only one option is in the stem: “Do you favor more funding for road maintenance?”
23. Even subtle changes in question wording can have large effects. Careful pretesting is essential to see and deal with these effects before they occur. For example, 57 percent gave a favorable rating to “Hillary Clinton,” whereas only 49 percent did for “Hillary Rodham Clinton.” Similarly, in a survey related to freedom of the press, only 27 percent approve of “censorship,” whereas 66 percent approve of “more restrictions.”
24. Don’t let statisticians overly influence how questions are worded. If you ask questions “to suit the analysis” but that people cannot reliably answer, you have gained nothing. Most people cannot provide accurate answers to interval-level questions – even though analyzing them is easy. They can, for the most part, answer rank-order questions and are very good at answering categorical questions. A stated opinion is not necessarily a thoughtful/accurate one. So, do pretesting to find out what people actually can answer. Statisticians are paid to figure out creatively how to analyze data that may not be in the order they most desire.
Survey rating scales
25. Assign verbal anchors for each scale point. Don’t just label the endpoints. If you can’t come up with verbal anchors that make sense for each scale point, it’s a good indication you have too many scale points and that respondents won’t be able to see the differences clearly either.
26. Don’t feel limited to one- or two-word scale-point anchors. For the “would you recommend” question, having options such as, “Would recommend even if I’m not asked,” “Would recommend but only if I’m asked,” “Would recommend against but only if I’m asked” and “Would go out of my way to recommend against” obtain much more insightful answers than the standard one-word anchors used.
27. Rating scale labels greatly affect responses. The more extreme the endpoint labels, the fewer people will select them. So, labeling the endpoints of a satisfaction scale with Extremely Satisfied and Extremely Dissatisfied results in fewer people selecting those options than if labeled Very Satisfied and Very Dissatisfied. Similarly, an importance scale with the endpoints labeled The Most Important and Not at All Important will generate fewer of those responses than one labeled Very Important and Not Very Important. Using strongly-worded endpoints generates the most insightful data.
28. Ranking questions are less useful than rating questions. They do not allow respondents to apply the same rank to more than one option – even though they may be of equal importance and, hence, yield data that do not reflect the person’s real view. Also, people violate instructions and only rank some of the items and, in some cases, actually indicate the same rank for multiple items. There is no reasonable way to deal with these anomalies when analyzing the results. If you treat such violations as missing data, it’s tragic because you throw out responses when you know exactly how the person feels. Use rating scales instead. Adding strong endpoint labels will accomplish what ranking questions attempt to do.
29. Consider trade-off designs instead of simple rating scales. Conjoint, discrete choice and max-diff question formats have the advantage that respondents are forced to trade off the various attributes of interest and, by doing so, actual ratio-level data can be obtained showing, for example, that Attribute A appears to be twice as influential in a decision as is Attribute B. Another advantage of these designs is they force the ultimate clients to consider during the design phase the variations in detail for the questions, which helps insure the resulting data are useful.
Data analysis
30. More data do not equal more insight. The real insight from research data comes from boiling them down. So, beware when piles of banner tables are presented instead of short, summary sentences that state what the research results mean.
31. Be careful of “that damn denominator.” The old adage, “One can prove anything with statistics” is largely a result of fiddling around with various denominators until finding one that yields the desired result. Even when analyses are not problematic, choice of the wrong denominator can yield misleading results. Some canned software packages present data from both relative frequencies and adjusted frequencies – with the former using a denominator of all people who were surveyed and the latter using as denominator only those people who responded to that particular question. Sometimes one of those is most useful and other times that same denominator may be totally irrelevant and misleading.
32. Carefully examine the variance in data. If 200 people are surveyed with a five-point scale and 100 give a response of 1 and the other 100 give a response of 5, the implication is entirely different than if all 200 people had given a 3 – even though the average response in both cases is 3.
33. Verbatims are where the action is. Properly-worded and sparingly-used verbatims 1) describe the effects people feel from actions being taken, 2) detail the exact problems and benefits of a situation so they are clearly understood, 3) describe solutions to problems and 4) convey the degree of emotion around certain situations. Also, verbatim responses add spice and clarity to research reports and presentations.
Statistical significance tests and estimation
34. Clients want to know the size of differences between/among groups. Just knowing there is a difference is not helpful. The common practice of merely conducting statistical significance tests is, therefore, misguided. After all, with large enough sample sizes even a difference of one to two percentage points will be statistically significant. But marketers and decision makers need to know how large the difference is when deciding whether to spend money in certain ways. To provide this more insightful guidance, analyses should be conducted that provide confidence intervals that estimate how precise the collected data are. Reporting the results as ranges most facilitates decision-making.
35. Statistical significance tests and confidence intervals can be jiggered to produce any desired effect. For example, a significance test showing a significant difference using an 80 percent confidence level may not show a significant difference using a 95 percent confidence level – with the exact same data. And, a confidence interval could be +/- 3 percentage points at an 85 percent confidence level while being only +/- 10 percentage points at a 95 percent confidence level. Because of this, 1) it’s critical to state in advance of any test or precision estimate which confidence level will be used as a basis for decision making and 2) one must always report the confidence level used when reporting the results of significance tests or confidence intervals.
36. Statistical significance is not the same as practical significance. A very small difference between/among groups may have no real impact on decisions – even if the difference is statistically significant. Conversely, a larger difference (that is not statistically significant because of a small sample size) may provide valuable direction for future business activities. One should not merely scan down banner tables and circle those that show statistical significance while ignoring the others. Instead, one should look at the size of the differences and, only then, be concerned with whether they are statistically significant. If they are not, one may still want to subject those variables to further research.
Making research results useful/actionable
37. Presenting preliminary data is essential. Early, preliminary data are almost always better than more complete data delivered late. Early peeks at the data allow decision makers to start their decision-making process. They can amend it, if needed, as more complete data arrive. And, experience has shown that the final data almost always closely mirror the preliminary data – minimizing any decision changes that may need to be made.
38. Balance academic integrity with actionable advice. Academics often waffle with statements such as, “On the one hand ...” whereas business decision makers want concrete answers to questions like, “What do you think I should do?” Researchers owe it to their clients to offer their thoughts. Having lots of data to support those thoughts is wise but presenting too much all at once can confuse the issue unnecessarily.
39. Business reasons may preclude all data seeing the light of day. Even when you have found serendipitous “good news,” the client contact may fear that sharing it with upper management could have negative effects – for a variety of reasons. Do not be surprised if this happens. It’s usually best to ask clients to sign off on any special “value added” analyses you may be considering conducting.
40. Don’t use customer satisfaction scores to compensate people. This applies to both employees and channel partners. Rather, compensate them for what actually improves customer service and satisfaction, which is the design and implementation of approved quality-improvement initiatives.