2 - Correlation
from PART 1 - DESCRIPTION
Published online by Cambridge University Press: 05 June 2012
Summary
The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning.
Stephen Jay GouldIntroduction
This chapter begins the study of describing data that contain more than one variable. We will see how the correlation coefficient and scatter plot can be used to describe bivariate data.
Not only will you learn the meaning and usefulness of the correlation coefficient, but, just as important, we will stress that there are times when the correlation coefficient is a poor summary and should not be used. There is no such thing as a perfect summary measure of data.
In addition, we emphasize that correlation merely indicates the level of linear association between two variables and should never be used to infer causation. It is tempting to suppose that a high correlation implies some kind of causal connection, but this is wrong.
Although much of this material may be familiar to students of statistics, we conclude the chapter with a discussion of ecological correlation, which is often omitted from introductory statistics courses. We show that the correlation coefficient based on individual level data may be markedly different when computed with grouped data. In economics, this is called the aggregation problem, and it merits attention.
Correlation Basics
Workbook: Correlation.xls
The basic message of this section is that a good, standard method for describing the relationship between two variables is to present a bivariate scatter diagram accompanied by summary statistics: the standard deviation (SD) and average of the two variables, the correlation coefficient, and the number of observations.
- Type
- Chapter
- Information
- Introductory EconometricsUsing Monte Carlo Simulation with Microsoft Excel, pp. 33 - 52Publisher: Cambridge University PressPrint publication year: 2005