All the statistical procedures developed so far were designed to help decide whether or not a set of observations is compatible with some hypothesis. These procedures yielded P values to estimate the chance of reporting that a treatment has an effect when it really does not and the power to estimate the chance that the test would detect a treatment effect of some specified size. This decision-making paradigm does not characterize the size of the difference or illuminate results that may not be statistically significant (i.e., not associated with a value of P below .05) but does nevertheless suggest an effect. In addition, since P depends not only on the magnitude of the treatment effect but also the sample size, it is not unusual for experiments with large sample sizes to yield very small values of P (what investigators often call “highly significant” results) when the magnitude of the treatment effect is so small that it is clinically or scientifically unimportant. As Chapter 6 noted, it can be more informative to think not only in terms of the accept—reject approach of statistical hypothesis testing but also to estimate the size of the treatment effect together with some measure of the uncertainty in that estimate.
This approach is not new; we used it in Chapter 2 when we defined the standard error of the mean to quantify the certainty with which we could estimate the population mean from a sample. We observed that since the population of all sample means at least approximately follows a normal distribution, the true (and unobserved) population mean will lie within about 2 standard errors of the mean of the sample mean 95% of the time. We now develop the tools to make this statement more precise and generalize it to apply to other estimation problems, such as the size of the effect a treatment produces. The resulting estimates, called confidence intervals, can also be used to test hypotheses.* This approach yields exactly the same conclusions as the procedures we discussed earlier because it simply represents a different perspective on how to use concepts like the standard error, t, and normal distributions. Confidence intervals are also used to estimate the range of values that include a specified proportion of all members of a population, such as the “normal range” of values for a laboratory test.
* Some statisticians believe that confidence intervals provide a better way to think about the results of experiments than traditional hypothesis testing.
In Chapter 4, we defined the t statistic to be
then computed its value for the data observed in an experiment. Next, we compared the result with the value tα that defined the most extreme 100α percent of the possible values to t that would occur (in both tails) if the two samples were drawn from a single population. If the observed value of t exceeded tα...