13 - Very large data collections
from PART III - MANAGING METADATA
Published online by Cambridge University Press: 08 June 2018
Summary
Overview
This chapter concentrates on aspects of retrieval and management that are particular to big data. This book originally set out to consider metadata about documents and document collections, using a wide definition of documents to include images, sound, museum objects, broadcast material, as well as text-based resources such as books, journal articles and web pages. Social media activity has been included in this, because it involves a permanent (usually text-based) record of social interactions or online behaviour. The type of metadata associated with each of these types of big data will vary considerably, as will the use to which it is put. Transactional data has largely been excluded from this scope, unless those transactions relate to documents. This chapter also describes linked data, an approach that expands the scope of data sets enormously, because it provides a mechanism for combining data sets from different repositories or collections – mediated by the internet.
The move towards big data
The move toward big data has been driven by increasing storage and processing capacity, the establishment of standards for exchange of data and the requirement of funders to make research data more widely available. This last factor is based on the idea that publicly funded researchers should make their data available for further exploitation. It is also driven by regulatory factors such as those that apply to the pharmaceutical industry. Criticism of clinical trials data focuses on the selective nature of publication, with the tendency for some pharmaceutical research companies to publish only data that favours their products, the phenomenon of ‘missing trials data’ documented by Ben Goldacre (2013) in his book Bad Pharma. The US government now requires all clinical trials to be registered according to Section 801 of the Food and Drug Administration Amendment Act, which came into force in 2017. The registration includes details of documents and data sets arising from the clinical trial including:
Type
Definition: The type of data set or document being shared.
• Individual Participant Data Set
• Study Protocol
• Statistical Analysis Plan
• Informed Consent Form
• Clinical Study Report
• Analytic Code
• Other (specify)
- Type
- Chapter
- Information
- Metadata for Information Management and RetrievalUnderstanding metadata and its use, pp. 203 - 220Publisher: FacetPrint publication year: 2018