INTRODUCTION
The generation of the IntCal, Marine, and SHCal radiocarbon (14C) age calibration curves (Heaton et al. Reference Heaton, Köhler, Butzin, Bard, Reimer, Austin, Bronk Ramsey, Grootes, Hughen and Kromer2020a; Hogg et al. Reference Hogg, Heaton, Hua, Palmer, Turney, Southon, Bayliss, Blackwell, Boswijk and Bronk Ramsey2020; Reimer et al. Reference Reimer, Austin, Bard, Bayliss, Blackwell, Ramsey, Butzin, Cheng, Edwards and Friedrich2020) is only possible because of the research and care which goes into generating and checking the underlying 14C datasets used in their construction. In particular, the assessment of data quality relative to set criteria and the collection of associated metadata (Reimer et al. Reference Reimer, Hughen, Guilderson, McCormac, Baillie, Bard, Barratt, Beck, Buck and Damon2002, Reference Reimer, Bard, Bayliss, Beck, Blackwell, Bronk Ramsey, Brown, Buck, Edwards and Friedrich2013) are key elements in the ongoing compilation of these datasets. In addition to their value for 14C calibration, these 14C datasets also have an importance in their own right for a wide spectrum of different areas of research (Heaton et al. Reference Heaton, Bard, Bronk Ramsey, Butzin, Köhler, Muscheler, Reimer and Wacker2021) and have routinely been made available to the international research community through the IntCal database (previously available from the Queen’s University Belfast webserver). The developments reported here build upon that initiative with the intention of facilitating access to, and enhancing transparency of, the broad range of information used in the construction of the calibration curves.
At present, the data collected and discussed here relate only to the construction of the pre-1950 IntCal, Marine, and SHCal curves that span from 55,000–0 cal yr BP. However, the intention is to broaden this to include the data for calibration in the post-1950 period (Hua et al. Reference Hua, Turnbull, Santos, Rakowski, Ancapichún, De Pol-Holz, Hammer, Lehman, Levin and Miller2022).
AIMS OF THE DEVELOPMENT WORK
The main aims of the new database structures are to facilitate the sharing of not only the primary, raw, 14C data used for calibration purposes but also the related data (such as tree ring measurements), metadata (including method statements and images), and to do so in a way that enables the data to be manipulated programmatically using tools such as R (R Core Team 2022), Python (e.g., Van Rossum and Drake Reference Van Rossum and Drake2009), and MATLAB (e.g., Higham and Higham Reference Higham and Higham2016). The specific objectives are to:
-
1. Make all the primary data (the 14C measurements and accompanying calendar age information with their associated uncertainties) that are used in construction of the IntCal family of calibration curves available in a digitally readable form.
-
2. Organize this data, as far as possible, by primary record rather than measurement initiative, in order to facilitate comparison of results from multiple laboratories and to put the data into a geographic context.
-
3. Particularly in relation to dendrochronology, provide supporting data such as ring-width series, and metadata detailing the methods that have been applied (Reimer et al. Reference Reimer, Bard, Bayliss, Beck, Blackwell, Bronk Ramsey, Brown, Buck, Edwards and Friedrich2013).
-
4. Include associated numerical data such as calendar age correlation or covariance matrices for any record where the approach used to generate the timescale introduces dependencies between the estimates of its calendar ages, for example, the annually resolved records such as the Lake Suigetsu record (Bronk Ramsey et al. Reference Bronk Ramsey, Heaton, Schlolaut, Staff, Bryant, Brauer, Lamb, Marshall and Nakagawa2020) with its modified varve-counting chronology, and ancient New Zealand kauri trees wiggle-matched onto 14C variability (Cooper et al. Reference Cooper, Turney, Palmer, Hogg, McGlone, Wilmshurst, Lorrey, Heaton, Russell and McCracken2021). In a similar way, the Cariaco Basin (Hughen and Heaton Reference Hughen and Heaton2020) and the Pakistan and Iberian margin records (Bard et al. Reference Bard, Ménot, Rostek, Licari, Böning, Edwards, Cheng, Wang and Heaton2013) which have calendar ages obtained by tuning to closely related climate markers (Heaton et al. Reference Heaton, Bard and Hughen2013), use the proxy data and derived relationships in the construction of the calibration curve (Heaton et al., Reference Heaton, Blaauw, Blackwell, Bronk Ramsey, Reimer and Scott2020b). The dating and proxy information for the speleothem records is also fundamental to the timescale for the curves.
-
5. Associate (or link) the data and metadata with the relevant publications, providing complete DOI information and URL links to the publications. This includes links to complimentary data archives, particularly those relating to the supporting dendrochronological data.
-
6. Provide methods for visualization of the 14C and dendrochronological data.
-
7. Include tools for checking and assessing data.
-
8. Allow for the easy import and export of both data and references to a range of other software.
These aims are intended to help different groups of researchers working with the data: those wishing to use or independently evaluate the data, those working on the preparation of new datasets, and the members of the IntCal group working on curve construction. It is important to remember that research on the underlying records is an ongoing process and that, for example, the timescales for the speleothems and their derived chronologies will change in future iterations (Cheng et al. Reference Cheng, Xu, Dong, Zhao, Li, Baker, Sinha, Spötl, Zhang and Du2021).
OVERALL DATA MODEL
In order to achieve these aims, use has been made of the pre-existing IntChron framework (Bronk Ramsey et al. Reference Bronk Ramsey, Blaauw, Kearney and Staff2019). This is specifically designed for sharing linked data and includes elements relevant to chronological data, particularly the ability to handle different timescale units and tools for data visualization. The data schema and associated tools have been updated to include elements required for the IntCal datasets, but the overall data model has been found to cover all the core IntCal requirements.
Data Organization
The data structure is essentially hierarchical but with the ability to link and associate information by searches. At the top level there are three main classes of information, with all other information organized within this structure:
-
1. Records: These contain all information relating to individual sites or records; each record has a unique short name, or site code, which is used as a key for accessing information. Within the record, there are three types of information:
-
a. Information on the location (latitude, longitude, elevation) and type (e.g., Marine, Terrestrial, Speleothem, …) of the record.
-
b. Data series lists which are used to hold data for the record. For this application the main data series types are:
-
i. IntCal data: the primary 14C calibration data (containing 14C measurements and quoted uncertainties, as well as the accompanying calendar age information).
-
ii. Dendrochronological sample data: the ring-width or oxygen isotope data associated with the measurement series (if appropriate).
-
iii. Metadata: other descriptive information about the record.
-
iv. Attachments: files (typically images) cited in the metadata.
-
v. Other data, such as ages and correlation or covariance files for records with a level of dependence between the calendar age estimates can be added with specific series types.
-
-
c. Reference links for the record as a whole and for the series within it.
-
-
2. Project data: which refers to data series that do not have a specific link to single records or sites. In the case of the IntCal database, information held at this level includes the calibration curves themselves and an index of data series organized by the IntCal dataset number. This also includes information on the time-variant relationship between the calibration curve and other (e.g., ice-core) timescales (Adolphi and Muscheler Reference Adolphi and Muscheler2016; Adolphi et al. Reference Adolphi, Bronk Ramsey, Erhardt, Lawrence Edwards, Cheng, Turney, Cooper, Svensson, Rasmussen, Fischer and Muscheler2018).
-
3. References: which holds full bibliographic details for all references referred to in the database (typically listed under records or data series). These references are usually directly linked to the relevant journal articles via doi.org and can be exported in BibTeX format for use in bibliographic tools.
Data Storage
The underlying storage format within IntChron is JSON because this facilitates easy interfacing with programming tools (most easily using tools such as R, Python, and MATLAB, but also in principle with languages such as C++ and C#), static archiving/storing of data without the need for software installation (Bronk Ramsey et al. Reference Bronk Ramsey, Blaauw, Kearney and Staff2019), and is commonly used in web-based applications.
The data will be stored and used in three different ways:
-
i. There is an active database accessible to members of the IntCal working group which is intended to help develop new calibration curves. This allows for the addition of new records and the updating of existing ones.
-
ii. There is a static archive of the IntCal20 datasets at https://intchron.org/archive/IntCal/IntCal20/index.json, which holds all the information accumulated for IntCal20 as it was when the curve was constructed. This is open access.
-
iii. It is possible for users to make their own copies of this archive (or parts of it) for their research and to prepare new data for inclusion in IntCal. These can be stored on the IntChron server in users’ own areas, or on users’ own computers.
In the case of the IntCal database itself (point “i” above), a hybrid approach has been adopted. The user interface, archives and file transfer all use JSON, but for the primary IntCal data, an underlying MySQL data table has been retained to minimise the risks of unintended changes. In addition, the JSON data records are stored in a database rather than as files. This has some advantages for a multi-user system and enables the database maintainers to use an automated archiving system on the computer server which holds the database. However, from the perspective of a user of the database or of the derived archives, the data organization is effectively in the form of JSON objects.
Reflecting the data organization, there are only three types of JSON files that are used for data exchange and presentation. These are:
-
1. Project data: these files hold links to all of the relevant records and project-level data series. In addition, the project data file contains all of the relevant publication information and details of parameter characteristics.
-
2. Record data: these contain all the information relevant to a specific record, including the data series contained within it.
-
3. Series data: these are for project-level data series which do not relate to specific records (such as calibration curves).
The overall database model is based on linked data, so it can also include references to files held outside the JSON data structure via attachments (typically images). In general, the aim has however been to avoid putting key information in such attachments because it makes for more difficult data distribution.
As a variant of the above model, it is possible to have all record and series data embedded into a single project data file. We have used this option for the data archive because it means that all of the data (other than attachments) can be retrieved from a single file rather than having to make multiple requests for each record and series. This approach would not be efficient for very large projects but poses no problems for the present IntCal datasets in terms of data handling (the entire IntCal20 archive is < 10 MB).
Attachment files are organized in a hierarchical file structure based on record or series name. A full archive of the IntCal20 data files is included as supplementary online information in this publication, allowing the complete archive to be reproduced without access to the current site. This is an important element in the long-term availability of the data.
Software Overview
In principle, the archive can be worked with entirely using software tools such as R, Python, and MATLAB. However, the IntChron integration tool (Bronk Ramsey et al. Reference Bronk Ramsey, Blaauw, Kearney and Staff2019) has been further developed to facilitate work with the IntCal database and provide a user interface for some pre-defined data manipulations. This tool can be accessed and preloaded with the open access static IntCal20 data using the url:
The integration tool has been specifically designed to work with data organized in the format described above, and the user is presented with a list of records and data series. The data can be explored by following links from this level. There are also search facilities built into the system which enable lists of relevant data series or primary data to be extracted. Records can be displayed on a map and the associated information retrieved by selecting the individual location points. Figure 1 shows a screenshot of the application in use.
This tool is available on the IntChron server and the full interface code will also be distributed with future releases of OxCal so it can be used without access to the IntChron site should that be necessary.
DATA ELEMENTS
The overall data schema for IntChron (including the parameters used for the IntCal dataset) is given at https://intchron.org/schema with the current version supplied in the supplementary information for this paper. Here we will focus on how the data are used specifically for IntCal, concentrating specifically on elements which might be less intuitive or where this might not otherwise be obvious. Each parameter has a formal name used in the JSON objects and for searches, and a display-orientated name (see Table 1). Parameter, series names and record names are all defined so that they are valid JavaScript variable names and can also be used as URLs without escaping.
Record Level Information
At the record level, the key information held is about the location. For dendrochronological samples, the genus and species of wood sampled (e.g., Quercus robur, Agathis australis) is also included (see Table 1 and Figure 2 for details). There is an optional comment field at this level which can consist of information not included in the standard parameters. The references at this record level should only be the key papers relevant to the sample set within IntCal. More specific references, for example to the dendrochronology, can be included with the dendrochronological series or metadata. The record name is ideally a formal site code (such as SG06) or failing that a short form of the site name (such as Maraa). In the case of records containing compilations of data (mostly older datasets) the lab code, country of origin of the samples, and taxon are used together (as in QL_DE_Oak).
Data Series Types
There are four main data series types included in the records. These are: IntCal_Data (the primary 14C calibration data used in IntCal), Dendro_Sample (dendrochronological data such as ring-widths for the samples, Data (a generic holder used primarily for metadata) and Files (attachments which are used principally for figures which cannot be included in any other way).
IntCal Data Series
For IntCal, these are the most important element of the data held in the database. The definitions of the parameters included are given in Table 1 and also in the online schema. For more recent data, each data series (with a set and division number) contains measurements from a single laboratory on a single sample set (environmental record or single tree sample). For some older datasets, multiple samples from a combined chronology are summarised within a single set/division. Ideally, for dendrochronological samples, the sample parameter in the series header will be the same as the series name for the associated dendrochronological data, allowing the two to be cross-checked especially where there are multiple-year block samples used.
Each sample has its midpoint age expressed in two ways. The first is in fractional astronomical year format (t), which takes account of the growing season for tree rings (see next section). For those samples/archives which have uncertain calendar ages when entering IntCal, these t values are the posterior mean calendar age of the sample (obtained after the calendar ages have been updated during IntCal construction, Heaton et al. Reference Heaton, Blaauw, Blackwell, Bronk Ramsey, Reimer and Scott2020b). The second is “calage” which gives the calendar age of the mid-point. The parameter calage refers to the year from which the sample derives (in calBP), so for NH wood this would be 0 for wood from AD1950, which would have a t value of 1950.5. For SH wood, the calage parameter assumes the Schulman convention (Schulman Reference Schulman1956), and thus calage = 0 would imply wood which started to grow in the austral spring of AD1950 (ending in the austral autumn of 1951), so it should have a t value of 1951.0. For samples dated by means other than dendrochronology, the calage is simply the prior age estimate before AD1950 with an associated uncertainty. For those samples/archives which have uncertain calendar ages when entering IntCal, the calage values provide the prior calendar age estimate before curve construction. Consequently, in the construction of the calibration curve the calage parameter is used as the inputs for curve construction, while the t parameter provides the posterior estimate of the age after curve construction has been completed and is the measure used for plotting purposes.
Dendrochronological Sample Data Series
These can be imported from various dendrochronological formats directly, or from the NOAA International Tree Ring Data Bank (ITRDB: NOAA NECI 2022) exported as Tucson.rwl files. The internal format of these files has some unique features needed for this global dataset. Ring numbering is typically from old to young but can be reversed where this is the case for the primary data. The date (t) parameter is a floating-point astronomical year and should reflect the growing season for the wood. As previously described, NH wood grown in 1950 will be stored as 1950.5, whereas SH wood that starts to grow in 1950 will be stored as 1951.0. In the interface you can choose whether to use the Schulman convention for display purposes. If this is selected, 1951.0 will be shown as AD1950⊣ showing that it is the end of this year, whereas if the convention is not applied it will show as AD1951⊢ indicating the start of the year. Alternatively, all dates can be shown in fractional format. The purpose of the internal format is that samples can be plotted on an absolute timescale that takes account of the different growing seasons between the two hemispheres.
Metadata
In order to properly understand the background to the dendrochronology underpinning the datasets used in IntCal, the database includes metadata. Such metadata are of greatest importance when the associated data are not already published elsewhere. The data included have been selected to include key information needed as outlined in Reimer et al. (Reference Reimer, Bard, Bayliss, Beck, Blackwell, Bronk Ramsey, Brown, Buck, Edwards and Friedrich2013). The metadata are usually structured to address key questions (see Table 2) and are in the form of a plain text file. Where tables are required, these can be incorporated within the main notes field using tab characters. A “code” field can also be added if needed (for example, COFECHA output extracts). The metadata should be kept succinct and not include unnecessary detail published or deposited elsewhere. Some of the information in the metadata is also held in a more structured way in the main database structure.
Attachments
In addition to the text information included in the structured database, attached files can also be included. These are mostly intended to be figures referred to in the metadata but can consist of longer datafiles and pdf reports if essential. It is however preferable that such information is published elsewhere and only referenced/linked in the IntCal database itself. Such attachments should not be seen as an alternative to the provision of structured information within the database.
Other Data
The calendar age correlation or covariance matrices, for those records with timescales that have been obtained using approaches that introduce a level of dependence between the calendar age estimates, are the main other data required for the IntCal calibration curve generation. These are stored in a particular IntCal_Correlation dataset which holds the matrix as a simple tab-delimited text matrix that can be read by standard software packages. See for example the SG06 record. This Lake Suigetsu record has an adjusted varve-counted chronology (Bronk Ramsey et al. Reference Bronk Ramsey, Heaton, Schlolaut, Staff, Bryant, Brauer, Lamb, Marshall and Nakagawa2020) whereby calendar age uncertainties at any individual depth within the core are propagated to the other depths (due to both the necessary depth ordering and limitations on changing sedimentation rates).
Age-depth models, proxy data and other types of dating information can also be included as outlined in Bronk Ramsey et al. (Reference Bronk Ramsey, Blaauw, Kearney and Staff2019).
TOOLS FOR DATA VISUALIZATION
In addition to allowing the data to be explored, the IntChron integration tool user interface has functions that enable the data to be plotted and visualised in several different ways.
Mapping of Records
The first of the display methods is a mapping interface. This allows the records (either as a whole, or a selected subset) to be shown on a map (as in Figure 1). The map is dynamically linked, so hovering over the points will give their site name, and clicking on them will bring up the relevant record and associated data. A single site can also be mapped to check its location.
Plotting Radiocarbon Data
The 14C calibration data can be plotted in several different ways. To select data to be plotted, you can either navigate to the records and add them to an accumulating plot, or you can use the top-level [Plot] function to select a period and the types of data that are to be included. The plotting can be against any of the main time-scale measures, overlay the appropriate calibration curves, and can use 14C age, F14C or age-corrected Δ14C as the plotted value. All errors are stored, displayed and plotted at 1σ. The plotting routines collate the associated publications, so selecting the [Cite] option for a plot will give a reference listing.
Plotting of Dendrochronological Data
It is also possible to plot the tree ring data included in the database using either the raw ring widths or filtered values. This is most useful when there are multiple sets from the same chronology or the user imports additional series either from the ITRDB or by uploading files.
TOOLS FOR DATA PREPARATION AND ASSESSMENT
One of the main reasons for the choice of the underlying data model for the IntCal database was to make the task of preparing, adding, and assessing data easier for both submitters of data and those involved in the compilation. Clearly, the central database itself cannot be open for modification, but by enabling users to make their own copies of the database, this allows additions and changes to be tested with full access to the associated tools. It is also hoped that by enabling this, data providers (who normally understand their data best) will be able to get the data ready for submission themselves.
There is a help facility within the software, available through the [Help] menu, and this contains specific information for IntCal data which will be kept up-to-date with any developments in the interface. Dedicated functions to extract values from the intcal20.json file have also been added to the rintcal R package, available on R’s CRAN repository (through the R command install.packages(‘rintcal’)).
This paper aims to indicate what is possible without being a complete guide as to how it is done. Workshops and recorded videos will be made available to explain this in further detail.
Creation of New Records
Within the IntChron framework, creating a new record is simply done by selecting the [+New record] option; this will prompt the user for a record name and then allow all the main record data fields to be filled in (see Table 1 or examples in the existing archive). The location of the site can be checked on a map using the [Map] option.
Creation of New Data Series
Once the record has been created new data series can be added by selecting “Add to data series”; again the user will be prompted for a series name and can then choose the type of data (typically IntCal_Data or Data for this purpose). An [Import/Export] function allows data to be imported from or exported to a spreadsheet (see Figure 2).
Working with Dendrochronological Data
Dendrochronological data can be most easily added in one of two ways. Within the record using [File > Import] will bring up the option of importing “Dendro” data (which will allow Heidelberg, Tucson or some other formats to be imported) or “NOAA NCEI study” which allows importing of a study already available within the ITRDB. The latter option will first add a link to the study, and then it is possible to import the raw ring data for the relevant sample from that link.
The main advantages of having raw ring width data within the database are twofold. Firstly, it allows the 14C data to be directly related to the ring width data by the shared number of the rings; if the dendrochronology is ever revised, this will allow for the correction of the dataset. Secondly, it enables the chronology of the dendrochronological series to be directly related to the 14C dataset. Suppose the dated tree-ring series is included in the database and the 14C samples have the ring numbers listed. In that case, it is possible to set the calage and t parameters of the 14C series directly without having to work these out independently. This should avoid problems arising with different conventions, timescales and growing seasons, all of which can be handled within the interface. Even if this is not used directly, it is still possible to check the dendrochronological and 14C series against one another, as in Figure 3. Such a plot has proved to be very useful for checking for internal consistency between dendrochronological reports and 14C datasets.
INTCAL WORKFLOW
Figure 4 shows the intended workflow for using the IntCal database. Users will typically start with the static archive, add their own data and check them using the visualization tools provided. Once checked, they would then send their data for possible inclusion in the main IntCal database and use for calibration curve generation.
Making Partial Copies of the Archive
Starting with the online archive (see above), users can save the whole archive or parts of it. Assuming they only wish to save part of it, they can select records either one by one— or most easily by using the plotting function described above:
-
1. Use the [Plot] option to select the time range and sample type.
-
2. Use the [Edit > Deselect all] to deselect the whole archive.
-
3. Use the [Select] option in the plotter to select all plotted records.
The user will then have the subset of records required. They can then save this in one of two ways. They can create their own project on the IntChron server by using [File > Save as] and then give a project name, or they can use [File > Download] to download the data (choosing only the selected data). The downloaded file can be uploaded into another project running on the server or the user’s computer.
Addition of New Data
Adding new data involves the creation of the record (as described above) and then the input of the key data. This will always include IntCal_Data and if relevant Dendro_Sample and metadata using the Data type to reproduce what is seen in the archived datasets. The intcal_set, intcal_division, and intcal_record_id can be left as null.
Submission of Data to the IntCal Group
The new record and dataset can then be sent to the relevant members of the IntCal group by using the [File > Download] option under the new record. This will download a JSON file for that record which can be considered for inclusion in the central database once all the appropriate checks have been made.
CONCLUSIONS
It is hoped that the provision of this comprehensive database for the IntCal project will facilitate the work of the 14C research community wishing to use the IntCal data and those wishing to contribute to it. The intention is to make the tools and all of the data used within the group open for all researchers. We consider it particularly important that all of the extensive work which underlies the datasets is fully referenced and that these references are easy to use for those accessing the data. Work is underway to prepare for the next update of the IntCal calibration curves, and loading data into this updatable format is intended to be the first stage in that process.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/RDC.2023.53
ACKNOWLEDGMENTS
The authors of this paper would like to acknowledge all of the work which has gone into generating the datasets included within this dataset, and all of those who have helped in their compilation for IntCal20 and earlier iterations of the calibration curves. In addition, we would like to acknowledge a number of young researchers who have tested and worked with the IntChron interface through its development over the last few years, particularly Rebecca Kearney, Lorena Becerra-Valdivia and Jakov Mlinaravic. Finally, we would like to acknowledge the support of the NERC NEIF facility (PR19008-NEIF-Oxford) for the underlying computing infrastructure and support for the IntChron interface development.