Editor’s note: Ken Faro is VP of research and Elie Ohana is a researcher at Hill Holliday's Decision Science practice, Boston.
In part one of this two-part article, we looked at a number of cases of misconduct around data practices primarily found in academic psychology. These serve are a reminder of what is not acceptable in market research as well as the best practices we should follow. In part two of this series we will focus on guidelines and standards for market researchers to implement in the research process.
Some might say that market research practitioners are at a higher risk of engaging in scientific misconduct compared to researchers in academic disciplines. After all, the practice of market research in the industry is not regulated by scientific entities like the American Psychological Association or the Association for Psychological Science. Similarly, market research practitioners don’t have to submit research to peer-reviewed journals or lab-based data audits and each market research firm has its own set of best practices and standards. The argument could be made that what is deemed scientifically acceptable is a function of a particular market research firm and not the industry at large.
Regardless of what population is more or less at risk of engaging in scientific misconduct, we believe that a clear set of guidelines on how to approach the research process can serve as a first step to preventing cases of scientific misconduct in the market research industry. Hopefully the guidelines suggested in this article will benefit market researchers as they begin to design, execute, analyze and report on their research.
1. Have a clear hypothesis of what results are expected. Having a set of hypotheses structures our analytical plan in a way that will minimize the number of tests run, thereby preventing the risk of getting a false positive (commonly referred to as a Type 1 error).
2. Design and document an analysis plan based on your hypotheses. The plan can include details such as:
- How many observations will be collected? Sample size should often be determined at the beginning of the study using a power analysis. If a sample size is not big enough, we might get a non-significant test result when in reality there really is a significant difference (this is commonly referred to as a Type 2 error or a false negative). This occurs because small sample sizes have low power. Therefore, a power analysis provides you with the sample size required to observe a given effect size with a given level of confidence and a given level of power.
- How will the data be cleaned? Having rules for cleaning criteria is important so you do not commit scientific misconduct in a way that systematically gets rid of cases with response patterns that oppose the relationship you want to see in the data. This could mean creating rules around certain data transformations (e.g., using a log transformation) or creating criteria for case exclusion (e.g., not including cases in the analysis that are more than two standard deviations away from the mean).
- What tests or models will be used? Flexibility around model selection is another potential cause for p-hacking. Given the number of models available to the researcher, one will eventually find a model with results that fit the desired story. Specifying the appropriate model ahead of time encourages the researcher to utilize the results from the model that was deemed appropriate.
- What criteria will be used for assessing statistical significance? Researchers are most familiar with this step – setting the criteria for rejecting the null hypotheses. While we’ve seen the significance level set to equal .10, most research should be set to .05 or .01 – thereby making Type 1 error or false positives less likely to occur.
- What criteria will be used for assessing model fit? If a general linear model (GLM; e.g., ANOVA) is fit and reveals omnibus or overall significance, there are additional measures researchers can use to assess model fit, such as r-squared, log-loss, AIC and BIC. These fit measures provide additional criteria to assess the extent to which a model is predicting well.
- What criteria will be used for assessing practical significance? Every statistical test has a calculation to assess effect size. One must determine if the results show a small, medium or large effect.
3. Follow your analysis plan. Many of us have witnessed colleagues endlessly exploring a data set until a desired finding magically appears. By sticking to the analysis plan, the researcher is taking steps to prevent their biases from interfering with the analysis. This also helps fight against the erroneous inclusion of a Type 1 error or false positive results.
4. Take steps to ensure your results are reproducible. Writing a well-commented analysis script that documents all analysis steps and procedures ensures that other researchers can check your work, encourages consistent analysis practices and makes it easier to catch errors. Even if co-workers have access to your method and can replicate the results on the initial data set, a second data set must be used to replicate and validate what you observed. Always show replication on a different data set than the one you initially observed the result on.
5. Report all analyses conducted as opposed to simply the ones that showed desirable results. As we’ve discussed, with enough tests or models statistically significant results will be found simply due to chance. Providing context for what analyses were performed is critical for evaluating the importance of research findings. If you show you ran 100 analyses and you have only five of those that are significant, others will have enough context to be highly suspicious about your results.
Quality research practices
Scientific misconduct isn’t something to shrug off. Many are upset that such misconduct sheds a terrible light on the discipline of psychological research and decreases confidence in the body of knowledge created since 1879. But while academics focus on the building of scientific knowledge, those of us in the applied world should be concerned about the damage we could be doing to our clients or companies. Brands turn to researchers for the sole purpose of helping them produce insight into their business and marketing challenges. If we can’t exercise proper scientific method we could be hurting more than helping.