Skip to main content Accessibility help
  • Cited by 1
  • Print publication year: 2016
  • Online publication date: December 2016

6 - Medical Big Data Research: Privacy and Autonomy Concerns

from Part II


In May 1996, Massachusetts Governor William Weld was hospitalized after losing consciousness because of what turned out to be influenza. In 1997, he received a copy of his hospital records from a graduate student named Latanya Sweeney, who had reidentified them. Ms Sweeney, who is now a Harvard professor, used “anonymized” state employee hospital records that were provided free of charge to researchers by the Massachusetts Group Insurance Commission (GIC). The GIC had removed obvious identifiers such as name and street address, but it had retained the patients’ birth dates, sexes, and zip codes. Dr. Sweeney then purchased the complete voter registration records for the city of Cambridge for twenty dollars. She was easily able to identify Governor Weld because “[o]nly six people in Cambridge shared his birth date, only three were men, and of the three, only he lived in his zip code.”

Electronic health record (EHR)–based research holds great promise. However, it also raises new questions and concerns. Collection of patient information into large databases poses new risks of privacy breaches that did not exist when paper files were simply locked away in file cabinets. In order to protect patient privacy, database operators generally deidentify information (i.e., they strip away identifying data elements). Yet what “deidentification” means is often a matter of controversy, and many experts agree that the risk of reidentification can never be fully eliminated.

Research risks such as privacy breaches are traditionally addressed by asking patients to make autonomous decisions about research participation through the informed consent process. Research involving only deidentified information, however, is exempted from informed consent requirements. But should it be?

This chapter focuses on privacy and autonomy concerns in the context of EHR database research. The chapter explores how the potential benefits of EHR database research can be reaped while the privacy and autonomy interests of data subjects are protected.


Ordinarily, biomedical research protocols require patient consent and institutional review board (IRB) approval, and patients must authorize the release of identifiable information to researchers under the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. By contrast, research using deidentified EHRs can be conducted with few regulatory constraints. Research involving solely deidentified records need not be subject to informed consent requirements or approved by an IRB and is not subject to coverage by the HIPAA Privacy Rule.