Editor's note: Gang Xu is a senior research consultant in statistics at Brintnall & Nicolini, Inc., a Philadelphia, Pa.-based health care consulting and marketing research firm.
In quantitative marketing research, we frequently collect data for analytic purposes. That is, the data will be analyzed primarily for inferential tests such as t-tests or z-proportion tests. The methods of estimating sample size for these inferential tests are different from those for a descriptive study. (See the June 1999 QMRR for the method of estimating sample size for a descriptive study.) Before I elaborate on the procedure of calculating the sample size for these inferential tests, let me first briefly review some key concepts which are very important in the calculation of sample size.
1. Hypothesis. A statistical hypothesis is a statement of belief about the population parameter. For instance, a statement that Drug A is safer than Drug B is a hypothesis. There are two types of hypotheses: a null hypothesis and an alternative hypothesis. The former is a statement saying that there is no statistical difference between or among the population parameters. The statement that there is no difference of weekly working hours between primary care physicians and specialists is an example of this. An alternative hypothesis, on the other hand, is the statement that disagrees with the null hypothesis. For instance, a statement that the mean weekly working hours of specialists is higher than that of primary care physicians is an example of an alternative hypothesis. To a large extent, the design of an analytic research study is to reject the null hypothesis.
An alternative hypothesis can indicate whether a test is directional or non-directional. If the alternative hypothesis is directional, such as the one mentioned above that specialists would work more hours than primary care physicians, the test is one-tailed. However, if the researcher doesn't know much about the working hours of physicians, he or she may simply hypothesize that there is a difference of weekly working hours between the two specialty groups. In such a case, the alternative hypothesis is non-directional, and the test is two-tailed. Holding all other factors constant, a two-tailed test requires a larger sample size than one-tailed test.
2. P-value. Associated with a hypothesis test is a p-value. P-value can be simply put as the likelihood of obtaining the observed results by chance alone. We usually use a p-value of .05 or .01 in estimating sample size. A p-value of .05 indicates that, when we reject the null hypothesis that there is no difference between the two groups, there is a 5 percent of chance that the null hypothesis is true. P-values are obtained after the statistical tests are performed. If a p-value is smaller than .05, for instance, we can say that the null hypothesis is rejected at the significance level of .05. Holding all other factors constant, a smaller p-value requires a larger sample size for a statistical test.
Power. Power is defined as the probability of rejecting the null hypothesis when it is false. In other words, power is the probability of detecting a significant difference when such a difference really exists. Obviously, a high power is a good thing to have in a quantitative analytic study. We usually use a power of .80 or .90 in estimating sample size. Holding all other factors constant, a higher power requires a larger sample size.
In this article, I'll briefly introduce the methods of calculating sample size for two most commonly used inferential tests: t-tests and z-proportional tests. We'll concentrate on each type of test in the following two sections.
A. T-test
Please note that a t-test is commonly employed to compare between two groups or the same group over a period of time in the mean value of a continuous variable. It also assumes that the distributions of the populations where the samples are drawn from are approximately normal and the variances are about equal.
Case study one: A pharmaceutical company is about to launch a new drug, Drug A. Before the launch, the company introduces a launch program where a group of physicians is exposed to the messages of the product for a period of three weeks. The company wants to assess the effectiveness of the program, specifically, the difference of the physicians' mean ratings on Drug A's efficacy level prior to the program (at the beginning of the first week) and afterwards (at the end of third week). Based on a previous study with similar launching program, the change of physicians' mean rating on the drug is 1.5 (on a 10-point scale) with a standard deviation of 3.0. How many physicians do you need to be able to assess the effectiveness of the program at the p-value of .05 and power of .80?
The formula used in calculating the sample size of the t-test is as follows:
2 (Zpvalue + Zpower)2
n = -----------------------
D2
Where:
n is the size of sample.
Zpvalue is the standard normal deviate for p-value.
Zpower is the standard normal deviate for power.
D is the standardized effect size.
Zpvalue is a fixed value set by you, the researcher. If the alternative hypothesis is two-tailed, Zpvalue = 1.96 when p-value = .05, and Zpvalue = 2.58 when p-value = .01. If the alternative hypothesis is one-tailed, Zpvalue = 1.65 when p-value = .05, and Zpvalue = 2.33 when p-value = .01. In the example shown above, we have Zpvalue of 1.96 for the two-tailed test. Note that Zpvalue at .05 for the two-tailed test is equivalent to the Zpvalue at .025 (.05/2) for the one-tailed test. This rule applied to other calculations as well if you know Zpvalue for one and want to know the other.
Zpower is also a fixed value. When power = .80, Zpower = .84; when power = .90, Zpower = 1.28. In the example, we have Zpower of .84.
The standardized effect size is the estimate effect size divided by the standard deviation. In this example, the effect size refers to the estimated change of mean ratings from the beginning to the end of the launch program. Therefore, the standardized effect size, the estimated mean difference of 1.5 divided by the standard deviation of 3.0, is .50.
Put the numbers into the equation, we have
2 (Zpvalue + Zpower )2 2 (1.96 + .84)2
n = ---------------------- = ------------------- = 62.72
D2 0.52
Rounding up, we need about 63 physicians in our study to participate in both pre- and post-launch programs.
It should be noted that both the effect size and standard deviation may be derived from previous research, a pilot study, or from our educated guess. A larger effect size, or a smaller standard deviation, will require a smaller sample size, holding all other variables constant. Also the decision of using either one-tailed or two-tailed tests is based on our knowledge about the study and hypothesis we are making. In the case study one, if we knew that the launch program on the drug will only increase physicians' ratings on the drug, we might use one-tailed test. We are thus using Zpvalue of 1.65 instead of 1.96 and would have a sample size of 50.
B. Z-proportion test
This is the test used to compare the difference between two proportions.
Case study two: A company is interested in knowing whether pulmonary specialists would prescribe Drug X for the treatment of asthmatic children more than family physicians. A review of previous research indicates that 5 percent of child patients with asthma were prescribed Drug X by family physicians. At a p-value of .05 and power = .90, how many family physicians and pulmonary specialists will need to be studied to determine whether at least 10 percent of such patients from the specialists be prescribed the drug?
The formula of calculating the sample size is:
2 (Zpvalue + Zpower)2
n = -----------------------
h2
Where
n is the size of sample.
Zpvalue is the standard normal deviate for p-value, as defined earlier. In this example, Zpvalue = 1.65 for one-tailed test.
Zpower is the standard normal deviate for power, as defined earlier. In this example, Zpower = 1.28
h is the effect size. h = @1 - @2 where @1 = 2 times arcsin transformation of square root of P1 and @2 = 2 times arcsin transformation of square root of P2.
P1 is the proportion of subjects in group 1. In the example, P1=.05, referring to the 5 percent of the patients prescribed for the treatment with drug X by family physicians. Similarly, P2 is the proportion of subjects in group 2. In the example, P2 =.10, referring to the 10 percent of the patients who were prescribed for the treatment with drug X by the specialists.
Table 1 lists the value of P and its corresponding value of ?. Looking into the table for the P1 of .05, we have the @1 of .4510. For P2 of .10, the corresponding value of @2 is .6435.
Therefore, h = @1 - @2 = .4510 - .6435 = -.1925
Put these numbers into the formula, we need a sample size of:
2 (Zpvalue + Zpower )2 2 (1.65 + 1.28)2
n = ---------------------- = ------------------- = 62.72
h2 (-.1925)2
Rounding up, we need about 464 physicians for each group.
This sample size seems large. You may compromise and recalculate the same size with less power. For instance, if you choose a power of .80 rather than .90, we would then use Zpower of .84 rather than 1.28. You would need a sample size of 335 physicians for each group.
It should be noted that, in this article, sample size is calculated for two commonly used statistical tests. For studies that may involve other statistical tests such as correlation or regression, separate estimations of sample size are needed.
Summary
- To calculate a sample size for inferential statistics tests such as t-tests or z-proportional tests, you have to provide your alternative hypothesis (one-tailed or two-tailed), decide your accepted level of p-value (e.g., .05 or .01) and power (e.g., .80 or .90).
- The numerators in both formulae for calculating sample size are the same. The differences lie in the calculation of the denominator (i.e., the effect size).
- For estimating the sample size for t-tests, you need to give an estimate on the effect size and standard deviation. The values may be derived from previous studies, your personal experiences, or your educational guess.
- For estimating the sample size for z-proportion tests, you need to give an estimate of proportion of subjects in each group and calculate the effect size accordingly.
- For other tests such as correlation or regression, separate estimates of sample size are needed.