Editor’s note: Gary M. Mullet is president of Gary Mullet Associates, Inc., a suburban Atlanta-based consulting and statistical data processing firm working primarily with producers and users of marketing research.
From both Statistics 101 and Marketing Research 201, whether in formal classes or on the job experience, we've all learned how to calculate sample sizes-answering such questions as, "How many respondents do I need to interview so that my 95 % sampling error will be no more than 4% on the brand preferred question in this upcoming survey?" In case we forget how to do the appropriate button pushing on our calculators, there are a number of computer programs, calculators or slide charts around which will answer this, and similar questions, when given the parameters. (My calculator gives a sample size of 601, by the way, for the above question.)
Whether we do such computations ourselves or look up the answers in widely available tables, we are explicitly concerned with one of the two wrong conclusions which can occur in statistical analyses-the Type I or α-error. Specifically we are saying we don't want to reject a particular true null hypothesis more than 5% of the time and/or we want to be 95 % confident that we don't reject that same null hypothesis when it is true.
Without the statistical jargon, what are we saying? Let's consider a simple product test. We've come up with a new product formulation and want to introduce our "NEW!! IMPROVED!!" product if the new formulation does significantly better than the old in a paired comparison test. We interview, tabulate, process and analyze and come up with the conclusion that, although 53 % of the respondents prefer the new (in a blindfold test, of course), this is not a significant preference and, thus, we will stick with the old formulation.
Implicit in our decision is our confidence/significance level. These have to do with the risk of introducing the "NEW!! IMPROVED!! " version of the product when it is not really preferred in the target population-the 53%, which certainly exceeds the 50% parity level, is due to sampling error and does not indicate that there is a real preference in the sampled population. What is frequently overlooked in such situations is the β-risk or Type II error probability. We also need to ask ourselves in such circumstances, "How likely is it that we have developed a 'NEW!! IMPROVED!!' version of our product that will fail to get enough votes in the sample for us to introduce it when it truly is preferred in the population of interest?" β-risk, then, has to do with failing to introduce the "NEW!! IMPROVED!!" product version when it really is preferred by the target population. Our sample may not indicate this true-but obviously unknown--preference, again due to sampling error. (By the way, just as 1-α =confidence level, there is a term for 1-β. It's called power.) In what follows, we will look at the effect of sample size on both types of potential errors (although you can only make, at most, one of them in any given decision) for a variety of marketing research situations. First, however, a brief review of some statistical terms.
Review/preview
In statistical significances tests, be they from marketing research, engineering, biomedical research or whatever, there are two competing conjectures or hypotheses of interest. They both have to do with the target population, such as female heads-of-households (FHHs). Frequently neither of these is explicitly stated. One of them is the null hypothesis and the alternative to that is, without a great deal of imagination, the alternative hypothesis. For example, you may be conducting a taste test to see how your new microwave chocolate cake is going to fare against the leading mix which requires a conventional oven. If you are interested in whether or not more than 50% of all FHHs prefer your chocolate cake, generally we would take the null hypothesis to be that 50% or less of all FHHs prefer your cake against the alternative that more than 50% prefer the new one. Since you obviously cannot interview all FHHs in your marketing area, you use a sample to help you decide which of these hypotheses is more tenable.
Now, several things can happen in your test, depending on which hypothesis is really true (and if we could ever know the answer to this for sure, we'd have no reason whatsoever to take a sample of opinions) and what our sample says. These are:
I) Microwave cake is not really preferred by the population of interest and sample confirms this. We'd always like this to be the case whenever we come up with an idea that really isn't going to fare well in the target population.
2) Microwave cake is not really preferred but sample indicates that it is significantly so. Here, due to sampling error, we have committed a Type I or α-error. The probability of such an occurrence is a Clearly, we don't want this to happen, because of, among other things, the severe monetary consequences of a product introduction/replacement. Thus, we'd like α to be small.
3) Microwave cake is really preferred by the target population but sample says that is not significantly so. Here, due to sampling error, we have committed a Type II or β-error. The probability of such an occurrence is β. Clearly, we don't want this to happen, because of, among other things, potential lost revenue/profit from not introducing a product which would do well in the marketplace. Thus, we'd like β to be small, as well.
4) Microwave cake is really preferred and sample says that it is not significantly so. Another good thing that can happen and we'd like it to happen as often as possible.
Redundantly, perhaps, many data tabulation packages only allow your to concentrate on the α-risk-the rejecting of a true null hypothesis. You specify α and the program tells you whether or not the sample result is statistically significant. What can be done if you want to look at potential lost opportunities? In what follows, we will look at our cake example in more detail and attempt to answer this question.
Incorporating β-risk
The easiest way to incorporate the notion of opportunity loss and β-error into a study is to do what you frequently do anyway explicitly state a difference that would be too good to miss. A statement about our taste test might now be, "We certainly won't introduce the microwave chocolate cake if it looks like half or less of the FHHs prefer it and if 55% or more like it we definitely want to go to the grocers' shelves." Thus, you are saying that the 5% above parity is really too good an opportunity to miss. Now you need to answer two other questions: what α and β are you willing to live with? (These are really tough to answer and really require a look at a lot of economic information. Fortunately, you'll probably not introduce you microwave chocolate cake based totally on just the results of our taste test.) So, how many FHHs do we need to put through our taste test?
Let's assume a variety of α-β combinations and see how they affect sample size, always with our "critical difference" of 55% - 50%(parity) = 5%. This difference, here 5%, is sometimes called the effect size. Taking the easy way out and using Kraemer and Thiemann (1987), we find:
Table 1
α | β | n |
.05 | .20 | 615 |
| .10 | 851 |
| .05 | 1075 |
| .01 | 1567 |
| | |
.01 | .20 | 997 |
| .10 | 1294 |
| .05 | 1567 |
| .01 | 2151 |
Table 2
α | β | n |
.05 | .20 | 151 |
| .10 | 209 |
| .05 | 264 |
| .01 | 384 |
| | |
.01 | .20 | 245 |
| .10 | 317 |
| .05 | 384 |
| .01 | 527 |
There are some obvious and correct conclusions from the above tables:
- For a given level of α a relative decrease in β results in a larger sample size
- For a given level of β, a relative decrease in α results in a larger sample size
- For fixed α and β, a smaller effect size results in a larger sample size (it's easier to tell the difference between 50% and 60% than it is to distinguish between 50% and 55%, so this makes sense).
Now let's do things a bit differently. Assume that you are doing the test using a sample of 200 FHHs and a significance level of α= .05. What's β? In other words, given that you are willing to conclude that your new microwave chocolate cake is superior, when it really isn't, 5% of the time, what's the chance of you failing to detect superiority of a product that really does be at parity? Again we'll take both the 5% and 10% effect sizes.
For the former case, it's probably easiest to again use Kraemer and Thiemann and extend the first table above. Then do a linear interpolation and find that β = .592. For the 10% effect size, merely interpolate between the sample sizes of 151 and 209 in the second table and find β= .116. While not exactly correct to the fourth or fifth decimal place, these interpolated values will generally suffice. Personally, I find the first number startling. Running a test with the "traditional" significance level of .05 (or, as many of us are more accustomed to, a confidence level of .95) and a quite reasonable sample size of 200, the β-risk is nearly 60% for an effect size of 5% above parity! This result clearly demonstrates how ignoring the idea of β can possibly lead to missed opportunities.
Other research situations and conclusions
First, we are not limited to tests like the one above, where we are only looking for superiority-the one-tailed alternative. The same type of analysis could and should be applied in the cases where you want to look for any difference in preference at all-the two-tailed or two-sided alternative.
Also, such opportunity loss or β-risk analysis can be used in the case of two independent samples, e.g., brand ever used versus brand never used, as well as two dependent samples, such as MHH and FHH in the same household. We can, as well, do these analyses for ratings scale means, correlation coefficients, and contingency tables (generally, a subset of a cross tabulation table run according to your tab plan).
The important point is that many of us overly concern ourselves with only Type I errors as measured by α That's the way we were taught or the way "we've always done it." This is not to say that one should always be concerned with β. However, an occasional look should improve your research decisions and, more importantly, your marketing decisions. And that's what it's all about in this game.
Reference
Kraemer, Helena Chmura and Sue Thiemann (1987, 5th printing 1989 ). "How Many Subjects?", Sage Publications, Inc., Newbury Park, CA.