Skip to main content Accessibility help
×
Hostname: page-component-84b7d79bbc-g78kv Total loading time: 0 Render date: 2024-07-28T12:24:37.846Z Has data issue: false hasContentIssue false

Introduction

Published online by Cambridge University Press:  09 November 2021

Get access

Summary

Hello and welcome to Between the Spreadsheets: Classifying and Fixing Dirty Data.

This is not your typical data book and that's because I’m not your typical data person. I have a wonderfully unique background that's a mix of corporate and data work, which has brought me to the point where I’m able to share my specialist knowledge with you.

Regardless of whether you are completely intimidated by data, starting out in your data career, a seasoned procurement or data professional, or a decision maker within an organisation, there will be something in here for you.

Dirty data is a problem. In every single organisation, no matter how big or small or where they’re located, you will hear talk of data quality issues. What you will rarely hear about is the consequences of this because people or companies don't want to admit their failures. We could be talking about millions of pounds or dollars lost on new technology, weeks or months spent fixing mistakes due to bad data, possible job losses or even worse.

It's not just that. We hear all the time of data scientists spending anything from 40–80% of their time cleaning or wrangling data. Why is this? Well, I believe it's because they are inefficient and inexperienced at it. ‘What? But they’re data scientists!’ I hear you cry. Unfortunately, that doesn't mean anything. Data cleaning is rarely covered in academic studies or other courses; the focus is always on the technical aspects of the role, yet ironically, they can't do any of that without the clean data first.

While data cleaning is one of the most vital parts of the whole process when working with data, it is often overlooked because either there is an assumption that people already know how to do it or it's considered too menial or not important enough to spend time on, or invest in.

It's not just in data science; I’ve seen this in other areas. I was managing a politics student and asked her to book a meeting and send invites. When it hadn't been done, I asked why and she explained she didn't know how to do it. I was blown away that someone studying politics (don't ask me why the politics thing made a difference, I have no idea) didn't know how to create a meeting invite. My assumptions were very wrong.

Type
Chapter
Information
Between the Spreadsheets
Classifying and Fixing Dirty Data
, pp. xvii - xxii
Publisher: Facet
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Introduction
  • Susan Walsh
  • Book: Between the Spreadsheets
  • Online publication: 09 November 2021
  • Chapter DOI: https://doi.org/10.29085/9781783305049.001
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Introduction
  • Susan Walsh
  • Book: Between the Spreadsheets
  • Online publication: 09 November 2021
  • Chapter DOI: https://doi.org/10.29085/9781783305049.001
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Introduction
  • Susan Walsh
  • Book: Between the Spreadsheets
  • Online publication: 09 November 2021
  • Chapter DOI: https://doi.org/10.29085/9781783305049.001
Available formats
×