Book contents
- Frontmatter
- Dedication
- Contents
- Preface and Acknowledgments
- Introduction
- Section I Thinking Like a Data Scientist
- 1 How the Rule of 72 Can Provide Guidance to Advance Your Wealth, Your Career, and Your Gas Mileage
- 2 Piano Virtuosos and the Four-Minute Mile
- 3 Happiness and Causal Inference
- 4 Causal Inference and Death
- 5 Using Experiments to Answer Four Vexing Questions
- 6 Causal Inferences from Observational Studies: Fracking, Injection Wells, Earthquakes, and Oklahoma
- 7 Life Follows Art: Gaming the Missing Data Algorithm
- Section II Communicating Like a Data Scientist
- Section III Applying the Tools of Data Science to Education
- Section IV Conclusion: Don't Try Th is at Home
- Bibliography
- Sources
- Index
3 - Happiness and Causal Inference
from Section I - Thinking Like a Data Scientist
Published online by Cambridge University Press: 05 December 2015
- Frontmatter
- Dedication
- Contents
- Preface and Acknowledgments
- Introduction
- Section I Thinking Like a Data Scientist
- 1 How the Rule of 72 Can Provide Guidance to Advance Your Wealth, Your Career, and Your Gas Mileage
- 2 Piano Virtuosos and the Four-Minute Mile
- 3 Happiness and Causal Inference
- 4 Causal Inference and Death
- 5 Using Experiments to Answer Four Vexing Questions
- 6 Causal Inferences from Observational Studies: Fracking, Injection Wells, Earthquakes, and Oklahoma
- 7 Life Follows Art: Gaming the Missing Data Algorithm
- Section II Communicating Like a Data Scientist
- Section III Applying the Tools of Data Science to Education
- Section IV Conclusion: Don't Try Th is at Home
- Bibliography
- Sources
- Index
Summary
Introduction
My old, and very dear, friend Henry Braun describes a data scientist as someone who's pretty good with numbers but hasn't got the personality to be an accountant. I like the ambiguity of the description, vaguely reminiscent of a sign next to a new housing development near me, “Never so much for so little.” But although ambiguity has an honored place in humor, it is less suitable within science. I believe that although some ambiguity is irreducible, some could be avoided if we could just teach others to think more like data scientists. Let me provide one illustration.
Issues of causality have haunted human thinkers for centuries, with the modern view usually ascribed to the Scot David Hume. Statisticians Ronald Fisher and Jerzy Neyman began to offer new insights into the topic in the 1920s, but the last forty years, beginning with Don Rubin's unlikely sourced 1974 paper, have witnessed an explosion in clarity and explicitness on the connections between science and causal inference. A signal event in statisticians’ modern exploration of this ancient topic was Paul Holland's comprehensive 1986 paper “Statistics and Causal Inference,” which laid out the foundations of what he referred to as “Rubin's Model for Causal Inference.”
Causa latet: vis est notissima
Ovid, Metamorphosis, IV c. 5A key idea in Rubin's model is that finding the cause of an effect is a task of insuperable difficulty, and so science can make itself most valuable by measuring the effects of causes. What is the effect of a cause? It is the difference between what happens if some unit is exposed to some treatment versus what would have been the result had it not been. This latter condition is a counterfactual and hence impossible to observe. Stated in a more general way, the causal effect is the difference between the actual outcome and some unobserved potential outcome.
Counterfactuals can never be observed hence, for an individual, we can never calculate the size of a causal effect directly. What we can do is calculate the average causal effect for a group. This can credibly be done through randomization.
- Type
- Chapter
- Information
- Truth or TruthinessDistinguishing Fact from Fiction by Learning to Think Like a Data Scientist, pp. 22 - 28Publisher: Cambridge University PressPrint publication year: 2015