Why marketing researchers use experiments
Editor’s note: Kevin Gray is president of Cannon Gray, a marketing science and analytics consultancy.
In causal analysis, randomized experiments are generally seen as the methodological gold standard. "No causation without manipulation" is a statistical adage that implies that only through randomized experiments can we establish the existence of a causal effect.
Random assignment of research participants to control or treatment groups reduces the risk that these groups were different in significant ways prior to administration of the treatment. The “treatment” could be a new medication, therapy, training program, digital advertisement or any number of interventions.
British statistician R. A. Fisher was a pioneer in the design and analysis of experiments in the 1920s and ‘30s, and randomized experiments are now an important tool in many disciplines.
Note that research “participants” need not be human or even alive, machines being an example of the latter. In some randomized experiments there are multiple controls and treatments, two standard medications and three new pharmaceuticals, for instance. Participants can also be followed over time after being randomly assigned to two or more experimental groups. Latent growth curve analysis is one example and event history analysis another.
Experimentation is a complex subject and a lot more than tossing a coin. “Experimental Design: Procedures for the Behavioral Sciences” (Kirk) and “Design and Analysis of Experiments” (Montgomery) are two standard reference books I can recommend.
One caveat is that randomization does not ensure that the treatment and control groups are adequately balanced with respect to important variables. In some cases, we can statistically adjust for imbalance after the fact so it does not bias our conclusions. When designing our experiment, if we have reason to believe certain variables are especially consequential, we can make use of block designs.
Neither approach guarantees that our experimental groups were equivalent before treatment, however. We will never be aware of all variables that may potentially affect our outcome and bias our effect size estimates.
Experiments and data analysis
When there are many variables felt to be related to the outcome, respondents can be statistically matched according to how similar they are with respect to variables known or suspected to be related to the outcome. One member of each matched pair would then be randomly assigned to the treatment and the other to the control.
After the experiment, we may wish to dig deeper and try to understand why the treatment was more (or less) effective than the control. There are numerous multivariate statistical methods that can be utilized to address this question.
Experiments, unfortunately, can be quite artificial and fail to generalize to real-world conditions. Reality is a different sort of laboratory, and for some studies field experiments are better suited. The subjects may be “WEIRD” too – small samples of American undergraduate psychology majors, for instance.
Experiments can go wrong in many ways. Placebos may be unmasked. Many patients may not take their medications as directed or drop out of studies for reasons related to the treatment, such as side effects. The outcome may be a surrogate variable (e.g., a test result) and not reliably related to the distal outcome (e.g., developing a specific medical condition).
Randomized experiments may also be infeasible or unethical in many circumstances. Assigning expecting mothers to smoking and non-smoking groups, for example, would be both infeasible and unethical.
In these cases, we must rely on observational data, data researchers can observe but are unable to manipulate experimentally. Without manipulation – random assignment – it is much harder to estimate effect sizes accurately, however. Healthier people may exercise more, but this may because they are healthier to begin with, not because they exercise more.
There are also quasi-experiments and natural experiments, which fall in between randomized experiments and purely observational studies. Conjoint analysis is yet another wrinkle. In conjoint, experimental designs are used to determine the combinations of product features shown to respondents. Respondents themselves are not normally assigned to experimental groups, though this is sometimes done.
Causal analysis is a challenging topic and experts are not in agreement on many fundamental issues. Except when there is some clear deterministic mechanism involved – with a machine which has malfunctioned, for instance – the best we can do is approach causation probabilistically. Statistics is a systematic means of dealing with uncertainly, thus its prominent role in research. Statistical models are probabilistic, not deterministic.
Meta-analysis is an important topic related to analysis of causation, and I have written a brief overview of it here. Propensity score analysis is another popular causal analytics tool. My short interview with Harvard epidemiologist Tyler VanderWeele, may be helpful.
In addition, three books I can recommend are “Experimental and Quasi-Experimental Designs” (Shadish et al.), “Observation and Experiment” (Rosenbaum) and “Mastering Metrics” (Angrist and Pischke). “Modern Epidemiology” (Lash et al.), “Cause and Correlation in Biology” (Shipley) and “Epidemiology by Design” (Westreich) may also be of interest for those looking for a deep dive into this subject. “Getting Real About Research” may also be helpful.
Causal analysis – part science, part art
The notion that big data has removed the need for theory and experimentation made a splash in the business media a decade or so ago and resurfaces now and again. This is a serious misunderstanding. It implicitly assumes more data necessarily means more useful information.
Many large data files, in fact, are heavily imputed and error-ridden and, in general, the risk of errors in our data tends to increase with the size of our data. Moreover, even with high-quality data it’s quite easy to find something that isn’t really there, as explained in “Stuff Happens.”
I’ve also occasionally heard it claimed or suggested that AI and machine learning are now able to automatically identify the causal mechanism underlying any data. This is dubious for many reasons. One is that many of these claims confuse predictive analytics with causal analysis.
Causal analysis is part science and part art and does not lend itself well to automation. There is no numerical criterion that can reveal the true causal model. AIC, BIC, MDL, cross-validation and other benchmarks can be useful in model building but typically disagree with respect to which is the best model. Furthermore, alternative “best” models may suggest divergent courses of action to decision makers or have contrasting implications for other researchers.
Two books on AI and machine learning I can recommend are “Artificial Intelligence” (Russell and Norvig) and “Machine Learning: A Probabilistic Perspective” (Murphy). “How to Stay Smart in a Smart World” (Gigerenzer) also skewers many popular myths.
Though somewhat tarnished, randomized experiments remain the gold standard in causal analytics.