Editor's note: Hank Zucker is president of Creative Research Systems, Petaluma, Calif.
"Level of significance" is a misleading term that many researchers do not fully understand. This article may help you understand the concept of statistical significance and the meaning of significance numbers.
In normal English, "significant" means important, while in statistics "significant" means probably true. A research finding may be true without being important. When statisticians say a result is "highly significant" they mean it is very probably true. They do not necessarily mean it is highly important.
Take a look at the table below. The bottom group of chi- (pronounced kie, like pie) squares shows two rows of figures. The figures of 0.07 and 24.37 are the chi-square statistics themselves. The meaning of the statistic depends on the exact numbers of rows and columns and the sample size and may be ignored for the purposes of this article. Interested readers may wish to consult a statistics text for a complete explanation. The second row contains values p = .795 and p = .001. These are the significance levels.
TABLE 1: Do you buy Brand X gasoline?
Area Type of Vehicle Driven
Total Center City Suburb Car Truck Van Compact
Unweighted base 713 361 352 247 150 44 180
50.6% 49.4% 34.6% 21.0% 6.2% 25.2%
Yes 428 215 213 131 74 29 131
60.0% 59.6% 60.5% 53.0% 49.3% 65.9% 72.8%
No 285 146 139 116 76 15 49
40.0% 40.4% 39.5% 47.0% 50.7% 34.1% 27.2%
Chi-square 0.07 24.37
p=.795 p=.001
Significance levels show you how probably true a result is. The most common level used to mean something is likely enough to be believed is 95 percent. This means that the finding has a 95 percent chance of being true. However, this value is also shown in a misleading way. No statistical package will show you "95 percent" or ".95" to indicate this level. Instead, it will show you ".05", meaning that the finding has a five percent (.05) chance of not being true, which is the same as a 95 percent chance of being true.
To find the significance level subtract the number shown from one. For example, a value of ".01" means there is a 99 percent (1 - .01 = .99) chance of it being true. In this table, there is a probably no difference in the purchase of Brand X gasoline by people in the city center and the suburbs, because p = .795 (i.e., there is only a 20.5 percent chance that the difference is true). In contrast, the high significance level (p = .001 or 99.9 percent) indicates there is very probably a genuine difference in purchasing Brand X gasoline by owners of different vehicles in the population from which this sample was drawn.
My company's cross tab program, the Survey System, uses significance levels in several statistical tests. In all cases, the p value tells you how likely something is not to be true. If a chi-square test shows p = .04, it means that there is a 96 percent (1 - .04 = 0.96) chance that the answers given by different groups in a banner really are different. If a t-test reports a probability of .07, it means that there is a 93 percent chance that the two means being compared would be truly different if you looked at the entire population.
People sometimes think that the 95 percent significance level is sacred. If a test shows a .06 probability, it means that it has a 94 percent chance of being true. You can't be quite as sure about it as if it had a 95 percent chance of being true, but the odds still are that it is true. The 95 percent level comes from academic publications, where a theory usually has to have at least a 95 percent chance of being true to be considered worth reporting. In the business world if something has a 90 percent chance of being true (p = .1, it certainly can't be considered proven, but it may be better to act as if it were true rather than false.
If you do a large number of tests, false positive results are a problem. Remember that a 95 percent chance of something being true means there is a 5 percent chance of it being false. This means that of every 100 tests that show a significant at the 95 percent level, the odds are that five of them do so incorrectly. If you took a totally random, meaningless set of data and did 100 significance tests, the odds are that five test would be incorrectly reported significant. As you can see, the more tests you do, the more a problem these false positives are. You cannot tell which the false results are -- you just know they are there.
Limiting the number of tests to a small group chosen before the data is collected is one way to reduce the problem. If this isn't practical there are other ways of solving this problem. The best approach from a statistical point of view is to repeat the study and see if you get the same results. If something is statistically significant in two separate studies, it is probably true. In real life it is not usually practical to repeat a survey, but you can use the split halves technique of dividing your sample randomly into two halves and doing the tests on each. If something is significant in both halves, it is probably true. The main problem with this technique is that when you halve the sample size, a difference has to be larger to be statistically significant. This is because the margin of error in a sample increases as the sample size decreases.
A final common error is also important. Most significance tests assume you have a truly random sample. If your sample is not truly random, a significance test may overstate the accuracy of the results, because the test only considers random error. It cannot consider biases resulting from non-random error (for example a badly selected sample).
To summarize:
- "Significant" need not mean "important."
- Probability values should be read in reverse (1 - p).
- Too many significance tests will show some falsely significant relationships.
- Check your sampling procedure to avoid bias.