Why cite data?
Creating datasets and other collections of evidence is an important research activity, and one that can involve significant investment of time and effort. At Cambridge University Press we believe that researchers deserve credit and recognition for this work, and that data should be considered legitimate, citable products of research alongside other research objects. Where relevant and appropriate, we therefore encourage authors to cite data as they would cite other research objects, such as publications and books, in their reference lists.
Challenges of citing data
Citing data may not be as straightforward as citing other research objects for a number of reasons, for example, those noted by NISO. Infrastructure for storing and sharing data varies significantly across disciplines, and not all datasets are static. It is therefore not surprising that no single universal standard for citing data has emerged - although several organisations have created guiding principles for data citation. FORCE11’s Joint Declaration of Data Citation Principles is one example, endorsed by NISO.
Finding guidance on data citation
Several of our journals have existing policies on data citation. When considering how to cite data, you should first check your journal's information pages to see if they provide guidance that is relevant to you and your field.
If you are citing data stored in a repository or archive, the repository itself may also provide guidance on data citation. If your journal does not have a policy on data citation, you should follow the repository's guidance on how to cite its data.
General principles for citing data
If you would like to cite data you have used in your work, and neither your journal nor the data source provides guidance on how to do so, we recommend including the following minimum elements necessary for dataset identification and retrieval. These are adapted from IASSIST's Quick Guide to Data Citation.
- Author: Name(s) of each individual or organisational entity responsible for the creation of the dataset.
- Date of Publication: The year the dataset was published or disseminated.
- Title: The complete title of the dataset, including the edition or version number, if applicable.
- Publisher and/or Distributor: The organisational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.
- Electronic Location or Identifier: Whenever possible this should be a unique, persistent, global identifier used to locate the dataset (such as a DOI). Otherwise a web address and the date of access can be used.
As a general rule, you should include enough identifying elements to precisely specify which data you have used in your work. These elements can then be arranged following the punctuation and order used for other citations in your journal's style. If the dataset has a DOI, you can use DataCite's DOI Citation Formatter to generate a citation in your choice of many standard formatting styles.
We also recommend prefacing any data citation with [dataset], to help us recognise and identify it with appropriate metadata.
Example 1: The journal Political Analysis gives specific guidance for citing data in its Instructions for Contributors, including example citations of datasets in the form Author, "Title", Identifier, Version, Date. This format should be followed when citing data in this journal, for example:
- [dataset] Monogan, Jamie, "Replication data for: A Case for Registering Studies of Political Outcomes: An Application in the 2010 House Elections", V6 [Version], June 3, 2013.
Example 2: The U.S. Geological Survey's ScienceBase Catalog provides example citations for many of its datasets. If your journal does not specify otherwise, you should follow the citation format given, for example:
- [dataset] Adams, M.J., Pearl, C.A., McCreary, B., and Rowe, J.C., 2019, Oregon spotted frog (Rana pretiosa) observations in Oregon, 2016-2018: U.S. Geological Survey data release, /10.5066/P940A4DW.
Example 3: The India Water Portal hosts many datasets but does not provide specific guidance on how to cite these. If your journal does not specify how to cite data, you should follow your journal’s style for other citations, including all necessary elements to identify the dataset. For example if your journal uses the Chicago Manual of Style:
- [dataset] Bardia, Vikas. 2009. All India _compiled monthly rainfall data_ (1901-2002)
. Distributed by India Water Portal. Accessed March 1, 2019. https://www.indiawaterportal.org/articles/meteorological-datasets-download-entire-datasets-various-meteorological-indicators-1901.
Still looking for advice?
If you have further questions about citing data, please feel free to contact us at any time.