Introduction: supporting data in the libraries
While data has been a feature of academic research since the early 20th century, it is only in the last decade that it has become such a ubiquitous feature of academic life – and all modern experience – that it has taken root in all aspects of the education environment. Concomitant with the rise in the awareness of ‘data’ as both a concept for debate and discussion and an object of study and analysis has been the rise of the idea of ‘data literacy’ or the ability to understand and interpret data. Different institutions have responded to the growth of data differently, and these choices often reflect circumstances of time as much as institutional traditions and organizational structures. This chapter will discuss how support for data has developed and grown at Columbia University Libraries, specifically within the Digital Social Science Center (herein, DSSC). While much of the development of data services at Columbia can be traced to a specific set of circumstances existing at a particular point in time, the overall trajectory of these services presents several general principles that can provide insights for other institutions.
Defining data literacy
Before delving into the specifics of data services at Columbia, it is worthwhile to discuss some of the basic elements of data literacy. At its most basic level, being data-literate means that someone understands what data is and how data can be used. Data as a concept is rather broad, and I would argue that nearly anything could be considered data in certain circumstances. In the simplest terms, data is information that has defined parameters that give it some sense of structure. The degree of structure varies widely, and many people are likely familiar with data sources that produce rather ‘dirty’ results, such as data that is scraped from government agencies’ websites or extracted from a Twitter feed. Nevertheless, even ‘messy’ or ‘dirty’ data has a fundamental element of structure in the sense that there are parameters defining the information contained in the dataset. These parameters include the unit of observation (i.e., the individual elements that make up the collected data), the variables/measures (i.e., the information that is recorded about each unit of observation), and the time period in which the observations were collected and/or created.