Book contents
Summary
Hello and welcome to Between the Spreadsheets: Classifying and Fixing Dirty Data.
This is not your typical data book and that's because I’m not your typical data person. I have a wonderfully unique background that's a mix of corporate and data work, which has brought me to the point where I’m able to share my specialist knowledge with you.
Regardless of whether you are completely intimidated by data, starting out in your data career, a seasoned procurement or data professional, or a decision maker within an organisation, there will be something in here for you.
Dirty data is a problem. In every single organisation, no matter how big or small or where they’re located, you will hear talk of data quality issues. What you will rarely hear about is the consequences of this because people or companies don't want to admit their failures. We could be talking about millions of pounds or dollars lost on new technology, weeks or months spent fixing mistakes due to bad data, possible job losses or even worse.
It's not just that. We hear all the time of data scientists spending anything from 40–80% of their time cleaning or wrangling data. Why is this? Well, I believe it's because they are inefficient and inexperienced at it. ‘What? But they’re data scientists!’ I hear you cry. Unfortunately, that doesn't mean anything. Data cleaning is rarely covered in academic studies or other courses; the focus is always on the technical aspects of the role, yet ironically, they can't do any of that without the clean data first.
While data cleaning is one of the most vital parts of the whole process when working with data, it is often overlooked because either there is an assumption that people already know how to do it or it's considered too menial or not important enough to spend time on, or invest in.
It's not just in data science; I’ve seen this in other areas. I was managing a politics student and asked her to book a meeting and send invites. When it hadn't been done, I asked why and she explained she didn't know how to do it. I was blown away that someone studying politics (don't ask me why the politics thing made a difference, I have no idea) didn't know how to create a meeting invite. My assumptions were very wrong.
- Type
- Chapter
- Information
- Between the SpreadsheetsClassifying and Fixing Dirty Data, pp. xvii - xxiiPublisher: FacetPrint publication year: 2021