Editor’s note: Kevin Gray is president of Cannon Gray, a marketing science and analytics consultancy. 

Statistics is a huge field and many disciplines such as biology, economics and psychology have made significant contributions to it. This link to journals published by the American Statistical Association and this link regarding the popular statistical software R demonstrate just how big a field it is. Statistics is not just point-and-click and its growing complexity makes simplifying it more difficult, not easier.

Moreover, in his popular textbook Statistical Rethinking, Richard McElreath of the Max Planck Institute makes a very important observation: "...statisticians do not in general exactly agree on how to analyze anything but the simplest of problems. The fact that statistical inference uses mathematics does not imply that there is only one reasonable or useful way to conduct an analysis. Engineering uses math as well but there are many ways to build a bridge."

Automated modeling is risky beyond a very basic level. One reason besides the intricacy of statistics is that criteria for evaluating models frequently do not agree on which model is "best." More fundamentally, any statistical model provides a simplified representation of the process or processes that gave rise to the data – "Essentially, all models are wrong but some are useful" in the immortal words of George Box. Several models may fit the data equally well but suggest dissimilar courses of action to decision makers.

Data preparation is also hard to automate. This is an essential early step in the modeling process for several reasons, one being that "clean" data are seldom clean. There are missing data, errors and, in the case of surveys, straightlining, wherein respondents give the same answer to a related set of questions, such as attitudes toward shopping. Data often must be transformed in various ways (e.g., logarithmic), collapsed when there are too many categories or combined into new variables. There are many judgement calls. Data preparation is often combined with exploratory data analysis and is used by statisticians to "get to know" the data.

Subject matter expertise matters! Many statisticians and data scientists have never worked with marketing research data, which may seem "soft" compared to what they are accustomed to. Moreover, they may not understand marketing very well and, in a sense, are operating in a vacuum. A big part of learning the ropes of statistics is understanding which technical details matter and which do not and when they matter and when they don't. Again, industry know-how is important and in marketing research knowledge of marketing is crucial. Ask your statistician how what may strike you as technical minutia might affect your decision. Be patient – explaining stats in everyday language can be very difficult even for experienced statisticians.

Statistics is as much a way of thinking as a set of tools. Humans are inclined to think categorically, for example, "Is it a go or a no-go?" rather than probabilistically, for instance, "If we do A, B and C, will our chances be better than if we don’t?" Decision makers able to think systemically and in terms of conditional probabilities will usually be more effective than those who see the world in black and white, even if they never use these terms.

Also, cognitive biases such as confirmation bias are part of human nature and not news to statisticians. They battle flaws in their own thinking as well in clients’ thinking every day. Be on the lookout for these foibles. At a minimum, ask yourself if your thinking is internally consistent and supported by real evidence.

GIGO – garbage in, garbage out. Statisticians are sometimes brought in as metaphorical relief pitchers when it looks like the game is getting out of hand. Sometimes they can strike out the side with the bases loaded but a better strategy is to involve a statistician in the design of the research. A professional statistician is also able to design research, not just analyze data they’ve been handed, and are often more skilled at this than most marketing researchers.

Primary versus secondary research, experimental versus non-experimental research and causal analysis are topics very important to marketing research. In primary research the data are collected with specific objectives in mind, whereas in secondary research we analyze data that have been collected for other purposes. Most consumer surveys are primary studies and most big data analytics by nature are secondary. Primary research, generally speaking, permits us to dig more deeply and develop richer insights than secondary research. It’s not either/or, though, and secondary research is often used to design primary research and help us interpret its results.

There are more than two-dozen types of experimental designs. Put very simply, in most experiments subjects (e.g., consumers) are randomly assigned to groups before being “treated” and the outcomes of these groups compared. Studies using non-experimental designs do not randomly allocate subjects to groups, making causal inferences more problematic.

Taste tests typically employ experimental designs. Most consumer surveys, however, are non-experimental and there may be important differences among respondent groups we define after the fact which haven’t been "randomized away." It’s much harder to figure out what’s causing what. Correlation is not causation and this link will take you to some humorous illustrations of why it isn’t. Whether or not we’re conscious of it, many business decisions involve notions about causation and we need to tread carefully.

Regression to the mean is another phenomenon that can lead us astray. An example is when we test a large number of product concepts. Some of those scoring exceptionally well would do less well if tested again and some of those that performed very poorly would do better the next time around. There are many examples of regression to the mean in marketing research and data science, another being that heavy consumers this year may consume much less next year and vice versa.

Related to this, please do not assume bigger data means better data or that statistics is no longer relevant in the age of big data. Bigger data is often dirtier data and, if anything, statistics is now more needed than ever. Following fashion, quite a few statisticians these days refer to themselves as data scientists (I often do) but many describing themselves as such seem to have had little or no formal education in statistics. Their main skills are in programming and data management and, unfortunately, analytics blunders are not uncommon in data science.

There has been a great deal of hoopla in recent years about big data, machine learning and artificial intelligence. Actually, data scientists frequently use statistics and computer algorithms rooted in statistics. For example, many artificial neural networks essentially are forms of regression (though often more complex than the regression we typically use in marketing research).

Some kinds of statistics, such as the t-test, are used to make inferences about a population from a sample. Other types, such as factor analysis, are used to explore interrelationships among variables. Potential cause-and-effect relationships can be examined with methods such as ANOVA and regression. We need to be careful not to mix up methods intended to do different things!

Significance testing is overused and misused in marketing research and many other fields. See the American Statistical Association’s 2016 Statement on Statistical Significance and P-Values for authoritative commentary about this. I’ll just say here that statistically significant does not necessarily mean important to decision makers and that many statistically insignificant findings, such as no differences in evaluations of a current product compared with a reformulation, are important.

A descriptive finding is not an insight. Discovering, to our surprise, that older men drink our beer brand about as frequently as younger men do is a potentially interesting finding but, by itself, not an insight. If we discover that our brand is benefitting from a competitor’s marketing aimed at older beer drinkers, in my view this is an insight because we've understood the why not just the what.

How Brands Grow (Sharp) and How Brands Grow: Part 2 (Romaniuk and Sharp), in my opinion, are must-reads for marketers. Regardless of how much you agree with these authors, I think one can make a strong case that segmentation and targeting are frequently misunderstood and misused. Brand loyalty is also much weaker for most brands than many marketers may believe.

It’s also important to understand that segments are not real – they are convenient ways to summarize data. There is no such thing as an Affluent Variety Seeker or a Struggling Thrifty. While there may be empirical evidence that targeting consumers with certain characteristics is more profitable than targeting other consumers or not targeting at all, we should not merely assume this.

Brand and user mapping and key driver analysis are two other "traditional" methods that are underutilized or not always conducted very professionally, to be honest. Many marketing researchers do not seem to appreciate the full potential of quantitative research. See the Quirk’s article “Are you limiting the value of quantitative research?” for suggestions on how to get more out of your quantitative research.

Five years ago, I had 25 years of experience as a statistician. In the five years since, I’ve probably learned more about statistics and how to use it than at any other period in my career. The law of diminishing returns does not apply. It takes time.

I hope these tips will come in handy soon!