# Chapter 8. Research Questions About Relationships among Variables

- Correlation and regression are statistical methods to examine the linear relationship between two numerical variables measured on the same subjects. Correlation describes a relationship, and regression describes both a relationship and predicts an outcome.
- Correlation coefficients range from –1 to +1, both indicating a perfect relationship between two variables. A correlation equal to 0 indicates no relationship.
- Scatterplots provide a visual display of the relationship between two numerical variables and are recommended to check for a linear relationship and extreme values.
- The coefficient of determination, or
*r*2, is simply the squared correlation; it is the preferred statistic to describe the strength between two numerical variables. - The
*t*test can be used to test the hypothesis that the population correlation is zero. - The Fisher
*z*transformation is used to form confidence intervals for the correlation or to test any hypotheses about the value of the correlation. - The Fisher
*z*transformation can also be used to form confidence intervals for the difference between correlations in two independent groups. - It is possible to test whether the correlation between one variable and a second is the same as the correlation between a third variable and a second variable.
- When one or both of the variables in correlation is skewed, the Spearman rho nonparametric correlation is advised.
- Linear regression is called
*linear*because it measures only straight-line relationships. - The least squares method is the one used in almost all regression examples in medicine. With one independent and one dependent variable, the regression equation can be given as a straight line.
- The standard error of the estimate is a statistic that can be used to test hypotheses or form confidence intervals about both the intercept and the regression coefficient (slope).
- One important use of regression is to be able to predict outcomes in a future group of subjects.
- When predicting outcomes, the confidence limits are called
confidence bands about the regression line. The most accurate predictions
are for outcomes close to the mean of the independent variable
*X*, and they become less precise as the outcome departs from the mean. - It is possible to test whether the regression line is the same (ie, has the same slope and intercept) in two different groups.
- A residual is the difference between the actual and the predicted outcome; looking at the distribution of residuals helps statisticians decide if the linear regression model is the best approach to analyzing the data.
- Regression toward the mean can result in a treatment or procedure appearing to be of value when it has had no actual effect; having a control group helps to guard against this problem.
- Correlation and regression should not be used unless observations are independent; it is not appropriate to include multiple measurements of the same subjects.
- Mixing two populations can also cause the correlation and regression coefficient to be larger than they should.
- The use of correlation versus regression should be dictated by the purpose of the research—whether it is to establish ...