‘Data! data! data!’, he cried impatiently. ‘I can't make bricks without clay.’(Sherlock Holmes in The Adventure of the Copper Breeches)
It is well known that we live in an age of unprecedented access to vast quantities of data, and that analysis of this data can produce insights that may be extremely valuable. Less well known is how to analyse the data to gain those insights. That is the topic of data science and this book. This first chapter explores what data science is, some of the drivers behind the rapid increase in data, and how it can be applied within the library and beyond.
By the end of this chapter the reader will be able to see the value of data science beyond the hype, and its widespread applicability within the library and information sector.
Data, information, knowledge, wisdom
Definitions of data science invariably incorporate terms such as data, information or knowledge:
Data analysis and Data Science attempt to extract information from data. (Idris, 2014, 1)
Data science is a set of fundamental principles that guide the extraction of knowledge from data.(Provost and Fawcett, 2013, 2)
Data science – the ability to extract knowledge and insights from large and complex data sets.(Patil, 2015)
So before we attempt to answer the question ‘What is data science?’, first we must understand what data is, and how it relates to the data–information– knowledge–wisdom (DIKW) hierarchy that is so often discussed by the library and information science community.
The DIKW model, popularised by Ackoff (1989), creates the hierarchy as a pyramid, with each stage building on the one below, with data at the base. In reality, however, the terms are often used with wide ranging and overlapping definitions (Rowley, 2007; Zins, 2007) and such a bottom-up approach fails to reflect the complexity of how information, knowledge and wisdom are actually derived as it relies too much on the data foundations (Frické, 2009). Of course, ‘all models are wrong, but some are useful’ (Box and Draper, 1987), and the fact that the DIKW model is wrong does not mean it is totally without value.