High-profile privacy breaches have trained the spotlight of public attention on data privacy. Until recently privacy, a relative laggard within computer security, could be enhanced only weakly by available technologies, when releasing aggregate statistics on sensitive data: until the mid-2000s definitions of privacy were merely syntactic. Contrast this with the state of affairs within cryptography that has long offered provably strong guarantees on maintaining the secrecy of encrypted information, based on computational limitations of attackers. Proposed as an answer to the challenge of bringing privacy on equal footing with cryptography, differential privacy (Dwork et al. 2006; Dwork & Roth 2014) has quickly grown in stature due to its formal nature and guarantees against powerful attackers. This chapter continues the discussion begun in Section 3.7, including a case study on the release of trained support vector machine (SVM) classifiers while preserving training data privacy. This chapter builds on (Rubinstein, Bartlett, Huang, & Taft 2012).
Privacy Breach Case Studies
We first review several high-profile privacy breaches achieved by privacy researchers. Together these have helped shape the discourse on privacy and in particular have led to important advancements in privacy-enhancing technologies. This section concludes with a discussion of lessons learned.
Massachusetts State Employees Health Records
An early privacy breach demonstrated the difficulty in defining the concept of personally identifiable information (PII) and led to the highly influential development of kanonymity (Sweeney 2002).
In the mid-1990s the Massachusetts Group Insurance Commission released private health records of state employees, showing individual hospital visits, for the purpose of fostering health research. To mitigate privacy risks to state employees, the Commission scrubbed all suspected PII: names, addresses, and Social Security numbers. What was released was pure medical information together with (what seemed to be innocuous) demographics: birthdate, gender, and zipcode.
Security researcher Sweeney realized that the demographic information not scrubbed was in fact partial PII. To demonstrate her idea, Sweeney obtained readily available public voter information for the city of Cambridge, Massachusetts, which included birthdates, zipcodes, and names. She then linked this public data to the “anonymized” released hospital records, thereby re-identifying many of the employees including her target, then Governor William Weld who originally oversaw the release of the health data.