I recently logged onto a website called the “Personal Genome Project” and looked at the “Participant Profiles” section. To my surprise, several profiles disclosed the name of the patient along with his or her date of birth, sex, weight, height, blood type, race, health conditions, medications, allergies, procedures, and more. Thus absolutely anyone with a computer would be able to view all these details. Other profiles excluded the name of the participant but provided all the other details, which could potentially allow a clever and motivated viewer to identify the patient.
Earlier chapters discussed medical big data use by professional researchers. This chapter focuses on the phenomenon of “open data.” Patient-related medical data can now be easily found on the Internet. With its help, ordinary citizens interested in scientific research are taking matters into their own hands. This is the era of citizen science and do-it-yourself biology. “Citizen science” is “the practice of public participation and collaboration in scientific research” through data collection, monitoring, and analysis for purposes of scientific discovery, usually without compensation. “Do-it-yourself biology” is an international movement “spreading the use of biotechnology beyond traditional academics and industrial institutions and into the lay public.”
Increasingly, government and private-sector sources are supplying data collections to the public, and this supply stream will expand considerably in the future. I will call publicly available resources “public-use data” or “open data” in this chapter. Open data are a global phenomenon and are furnished by entities such as the World Health Organization, the UK government, the Canadian Institute for Health Information, Japan's National Institute of Genetics, and many more. The pages that follow will describe representative open data sources, analyze their benefits and risks, and formulate recommendations for responsible handling of open data.
PUBLICLY AVAILABLE BIG DATA SOURCES
Many large databases offer public access to patient-related health information. These databases have been established by federal and state governments as well as by private-sector enterprises. No comprehensive catalogue of these sources exists. This section focuses on a sample of US databases that feature public-use medical data.