# Chapter Two: The First Step: Understanding Simple Linear Regression

The purpose of this book is to teach readers how to formulate biological, clinical, and other problems in terms of equations that contain parameters that can be estimated from data and, in turn, can be used to gain insights into the problem under study. The power of multiple regression—whether applied directly or used to formulate an analysis of variance—is that it permits you to consider the simultaneous effects of several variables that act together to determine the value of some outcome variable. Although multiple linear regression models can become quite complicated, they are all, in principle, the same. In fact, one can present the key elements of any linear regression in the context of a simple two-variable linear regression.^{*} We will estimate how much one variable increases (or decreases) on the average as another variable changes with a *regression line* and quantify the *strength of the association* between the two variables with a *correlation coefficient.* We will generalize these concepts to the multiple variable case in Chapter 3. The remainder of the book will explore how to apply these powerful tools intelligently.

^{*}This chapter is revised from Glantz S. *Primer of Biostatistics*. 7th ed. New York, NY: McGraw-Hill; 2012: chap 8, "How to Test for Trends." The notation has been revised to be consistent with that used in multiple regression, and a detailed discussion of sums of squares as they apply to regression analysis and the use of computer programs to do the computation in regression analysis has been added.

Simple *linear regression* is a parametric statistical technique used to analyze experiments in which the samples are drawn from populations characterized by a mean response that varies *continuously* with the magnitude of the treatment. To understand the nature of this population and the associated random samples, we continue to explore Mars, where we now examine the entire *population* of Martians—all 200 of them.

Figure 2-1 shows a plot in which each point represents the height *x* and weight *y* of one Martian. Because we have observed the *entire population,* there is no question that tall Martians tend to be heavier than short Martians. (Because we are discussing simple linear regression, we will ignore the effects of differing water consumption.) For example, the Martians who are 32 cm tall weigh 7.1, 7.9, 8.3, and 8.8 g, so the mean weight of Martians who are 32 cm tall is 8 g. The eight Martians who are 46 cm tall weigh 13.7, 14.5, 14.8, 15.0, 15.1, 15.2, 15.3, and 15.8 g, so the mean weight of Martians who are 46 cm tall is 14.9 g. Figure 2-2 shows that *the mean weight of Martians at each height increases linearly as height increases.*