The last two decades represent an unprecedented period in the history of data analysis. As the cost of technology has steadily decreased, access to sophisticated data tools has increased, expanding the audience for data-informed research and decision making. At the same time, new areas of research and research methodologies are now possible with the rapid growth of online data produced as a byproduct of digital commerce, file sharing and social media. Together, this confluence of inexpensive computing, plentiful data and accessible tools has created a new interdisciplinary area of research that harnesses the traditional disciplinary expertise of statisticians and computer scientists to explore a wide range of data-related questions. As more researchers and companies embrace data-driven approaches, the phrase ‘data science’ has become an increasingly popular term to describe this growing area of research.
Defining data science
Attempts to apply strict disciplinary boundaries to ‘data science’ remain elusive. While many practitioners have attempted to craft conceptual definitions of this research space, others have challenged the notion that data science is a new field (Conway, 2013; Boykis, 2019). Donoho (2017) makes a compelling argument that statisticians have engaged in data science since the 1960s and others, such as Press (2013), note that many disciplines have explored data science concepts for decades.
Despite these disagreements over the disciplinary origins of data science, its methodologies and scope and the professional role of data scientists, the persistence of data science as a topic over the last few decades indicates that it is not a passing trend. Indeed, an increasing focus on data science by learned societies (National Academies of Sciences, Engineering, and Medicine, 2018), the growing number of data science courses and centers (http://msdse.org/ environments) and the proliferation of data scientist job postings indicate that the data science movement is a powerful force in academia.
In its current form, the term ‘data science’ is easier to define by its application than by theory. It is widely understood to include a diverse range of computational, data-driven approaches to research and business analytics. In academia, this interdisciplinary space comprises a broad range of methodologies, including machine learning, social media analysis, spatial analytics, text analysis and web analytics to name a few.