Skip to main content Accessibility help
×
Home
  • Print publication year: 2011
  • Online publication date: June 2018

3 - Data silos

Summary

Introduction

In this book the web of data has been defined as: data that is structured in a machine-readable format and that has been published openly on the web. Whilst the web of data may be defined more narrowly, for example, only referring to data published according to Linked Data principles (see Chapter 4), or more broadly by including data that is kept within organizations, the working definition of the web of data used in this book recognizes the importance of open data, the role of the librarian in providing access to it and the multitude of ways data is being made available. Although there is currently significant interest in Linked Data, there is a lot of data published with little thought to the wider information landscape, and the data published in these ways is already being widely used: from that published in Excel spreadsheets and Google Docs to data that is being made available through APIs. These individualistic approaches to publishing data may be thought of as data silos, with the data isolated from the wider environment and making use of their own idiosyncratic formats. Nonetheless libraries and other organizations are increasingly making use of these technologies to publish a plethora of data. Whist publishing data in some formats makes it more useful to the user community than others, data has value whatever the format it is published in and it is important that this data is not ignored as we increasingly look towards data being made available according to preferred principles.

As with all data published online, that in the silos is only of value to those with the skills to make use of it. Although the skills necessary to make use of data in silos is often lower than that in more distributed formats discussed in the following chapters, it nonetheless varies considerably, as will the help needed by users from library and information professionals. Where the data has been stored in a single document a user may need little more than help in finding the document, whereas if it can only be interacted with via an application programming interface (API) users may need help not only identifying the data source, but also with collecting and manipulating the data.