Book contents
- Frontmatter
- Dedication
- Contents
- Preface and Acknowledgments
- Introduction
- Section I Thinking Like a Data Scientist
- 1 How the Rule of 72 Can Provide Guidance to Advance Your Wealth, Your Career, and Your Gas Mileage
- 2 Piano Virtuosos and the Four-Minute Mile
- 3 Happiness and Causal Inference
- 4 Causal Inference and Death
- 5 Using Experiments to Answer Four Vexing Questions
- 6 Causal Inferences from Observational Studies: Fracking, Injection Wells, Earthquakes, and Oklahoma
- 7 Life Follows Art: Gaming the Missing Data Algorithm
- Section II Communicating Like a Data Scientist
- Section III Applying the Tools of Data Science to Education
- Section IV Conclusion: Don't Try Th is at Home
- Bibliography
- Sources
- Index
7 - Life Follows Art: Gaming the Missing Data Algorithm
from Section I - Thinking Like a Data Scientist
Published online by Cambridge University Press: 05 December 2015
- Frontmatter
- Dedication
- Contents
- Preface and Acknowledgments
- Introduction
- Section I Thinking Like a Data Scientist
- 1 How the Rule of 72 Can Provide Guidance to Advance Your Wealth, Your Career, and Your Gas Mileage
- 2 Piano Virtuosos and the Four-Minute Mile
- 3 Happiness and Causal Inference
- 4 Causal Inference and Death
- 5 Using Experiments to Answer Four Vexing Questions
- 6 Causal Inferences from Observational Studies: Fracking, Injection Wells, Earthquakes, and Oklahoma
- 7 Life Follows Art: Gaming the Missing Data Algorithm
- Section II Communicating Like a Data Scientist
- Section III Applying the Tools of Data Science to Education
- Section IV Conclusion: Don't Try Th is at Home
- Bibliography
- Sources
- Index
Summary
In 1969 Bowdoin College was pathbreaking when it changed its admissions policy to make college admissions tests optional. About one-third of its accepted classes took advantage of this policy and did not submit SAT scores. I followed up on Bowdoin's class of 1999 and found that the 106 students who did not submit SAT scores did substantially worse in their first year grades at Bowdoin than did their 273 classmates who did submit SAT scores (see Figure 7.1). Would their SAT scores, had they been available to Bowdoin's admissions office, have predicted their diminished academic performance?
As it turned out, all of those students who did not submit SAT scores, actually took the test, but decided not to submit them to Bowdoin. Why? There are many plausible reasons, but one of the most likely ones was that they did not think that their test scores were high enough to be of any help in getting them into Bowdoin. Of course, under ordinary circumstances, this speculative answer is not the beginning of an investigation, but its end. The SAT scores of students who did not submit them have to be treated as missing data – at least by Bowdoin's admissions office, but not by me. Through a special data-gathering effort at the Educational Testing Service we retrieved those SAT scores and found that while the students who submitted SAT scores averaged 1323 (the sum of their verbal and quantitative scores), those who didn't submit them averaged only 1201 – more than a standard deviation lower! As it turned out, had the admissions office had access to these scores they could have predicted the lower collegiate performance of these students (see Figure 7.2).
Why would a college opt for ignorance of useful information? Again there is a long list of possible reasons, and your speculations are at least as valid as mine, so I will focus on just one: the consequences of treating missing data as missing at random (that means that the average missing score is equal to the average score that was reported, or that those who did not report their SAT scores did just as well as those who did). The average SAT score for Bowdoin's class of 1999 was observed to be 1323, but the true average, including all members of the class was 1288.
- Type
- Chapter
- Information
- Truth or TruthinessDistinguishing Fact from Fiction by Learning to Think Like a Data Scientist, pp. 72 - 78Publisher: Cambridge University PressPrint publication year: 2015