Cleaning your data without getting your hands dirty…

Cleaning your data is the job we all have to do but no one likes to do. Getting your hands dirty once in a while is not that bad actually. The basics of data cleaning are all about getting your hands dirty and keep on getting them dirty. When you know the basics you also know the impact of quality data after cleaning it. Let take a closer look at simple three step program with the basics of data cleaning.

3 Step Data Cleaning Program

Recently I found this very interesting blog on Data Science Central about data cleaning. The provided the community with a five step plan to get your data clean. I turned this in a 3 step data cleaning program with the main headlines to take in consideration when data cleaning comes to the table.


Each statistical package has its own particular quirks, and if you know what they are you can arrange your data accordingly right from the beginning.This is what I mean by planning. It’s not just about collecting your data. It’s about collecting your data to the necessary degree of precision, in the correct format, and making sure that it is fit-for-purpose and capable of answering your research questions.

Collect & Clean

Making sure that your data is as clean as it can be even before you start data cleaning is the best and easiest way to hold on to your sanity. Data cleaning isn’t really about data cleaning. It’s about being organised. Anybody can clean data, but not everybody can clean data quickly and efficiently. Organizing for example your Excel workbook or your software solution before you get started with your data collection or data entry is a skill that is worth learning.


Finally, you should understand your data and make sure that it is fit-for-purpose and capable of answering your hypotheses. Of course, if you’ve planned your study carefully right back at the beginning, then all of this will just drop into place.

This Data Cleaning Program is based on the research of Lee Baker. He is an award-winning software creator with a passion for turning data into a story. 

Dirty hands?

Will your hands get dirty when cleaning your data? Well of course this is a figure of speech but Yes they will. You and your company have to put a lot of effort into data cleaning over and over again. Striving for first time right must be your goal but remember it also will be a goal that is almost unreachable. In most cases companies reach the 95% and the top performers even 98%. But it is worth it. It your foundation of your data house and it enables the use of other types of data to be successful. So get your hands a dirty multiple times a year and ask your help in the household to put hygiene at the highest stake.