Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-fv566 Total loading time: 0 Render date: 2024-07-16T09:52:52.761Z Has data issue: false hasContentIssue false

3 - Cleaning

Published online by Cambridge University Press:  10 September 2022

Get access

Summary

Learning outcomes of this chapter

  • • Adopting a broader view of metadata quality

  • • Why you need to clean your metadata in the context of linked data

  • • Identification of most common metadata quality issues

  • • Understanding the possibilities and limits of automated metadata cleaning

  • • Case study: cleaning metadata of the Schoenberg Database of Manuscripts

Introduction

‘It is not a bug, it is a feature’ is one of the more interesting lines one of us learnt when working for a software company. When a customer noticed an inconsistency in one of the products, the challenge was to convince the client the issue was not a shortcoming but actually a quality of the software. This line comes to mind when we think about the relation between linked data and metadata quality. The lack of consistent, formalized and well structured data on the web is often presented as the biggest Achilles’ heel for the realization of the semantic web and linked data vision. However, we prefer to see the same reality from another viewpoint. Even the most ardent critic of linked data must admit at least one positive outcome: linked data have put metadata quality in the spotlight, finally giving this topic the attention it deserves.

If you only remember one thing from this chapter, it should be this: all metadata is dirty, but you can do something about it. Recurrent metadata quality issues such as duplicate records or inconsistent encoding of dates or names all have a negative impact on the use of your metadata but also on the implementation of linked data methodologies. As Chapters 4 and 5 will demonstrate, the success rate of methods such as reconciliation and enrichment depends to a large extent on how consistent and well structured your metadata are. Data profiling and cleaning techniques will teach you how to spot these issues and where possible mitigate them.

The difficulty of combining theory and practice

Data quality has attracted a lot of attention recently within academic circles. A large number of papers and books describe data quality with the help of theoretical concepts, models and frameworks which often refer to and build upon one another.

Type
Chapter
Information
Linked Data for Libraries, Archives and Museums
How to clean, Link and Publish your Metadata
, pp. 71 - 108
Publisher: Facet
Print publication year: 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Cleaning
  • Seth van Hooland, Ruben Verborgh
  • Book: Linked Data for Libraries, Archives and Museums
  • Online publication: 10 September 2022
  • Chapter DOI: https://doi.org/10.29085/9781783300389.005
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Cleaning
  • Seth van Hooland, Ruben Verborgh
  • Book: Linked Data for Libraries, Archives and Museums
  • Online publication: 10 September 2022
  • Chapter DOI: https://doi.org/10.29085/9781783300389.005
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Cleaning
  • Seth van Hooland, Ruben Verborgh
  • Book: Linked Data for Libraries, Archives and Museums
  • Online publication: 10 September 2022
  • Chapter DOI: https://doi.org/10.29085/9781783300389.005
Available formats
×