You don’t wash your car once a year
The fact that keeping data accurate is an ongoing process is well known to many of us. Data is constantly moving and edited by different people and departments. This means that the chance of “dirty data” is always present and the risks of dirty data are as well. Data cleansing should therefore be an ongoing process. However data cleansing projects are time expensive and not that simple. In my opinion the most fitting comparison is with washing your car. You have to do it often but not every day.
The explanation of data cleansing
Correction or removal of erroneous (dirty) data caused by contradictions, disparities, keying mistakes, missing bits, etc. It also includes validation of the changes made, and may require normalization. By business dictionary
Data cleansing or First time right
Data governance structures and data stewards are guardians of the “first time right” principles. They do their best to get the data free of errors into the systems of record. But when a lot of people take a lot of actions around data it is hard to ensure a 100% data trustworthy. First time right should be the principle to stand by but in this case I would say there is no “or” . Organizations need data cleansing as well to get close to this 100% data quality.
The method: Extract, Transform and Load (ETL)
As mentioned in the earlier chapter the most used way to clean data is via ETL. This is three step method that will help you get your data cleaned in a logical way. Lets
Extract: Extracting the data from the original database into a work environment with the possibility to make the adjustments needed in large amounts or at specific details.
Transform: transforming is the part you adjust the data into data that has the quality for business use. Transforming can be easy by just adding ZIP codes whom are missing but nowadays we also see complex algorithms applied to transform data.
Load: Loading the data is last step in the method in which you load back the data into the system of record. This step is a formality if you apply the business rules set in the system of record. If during the cleaning there are aspects added in the work file that are not yet build in the system. The systems needs adjustments as well. Be careful with this action.
Importance of Data Cleansing
The importance of data cleansing can’t be underestimated. Even when you have a great governance structure in place. Data cleansing is a serious project with serious time expenses. As an organization you should remember that “dirty data” will cost you even more. So wash your car not once a year but preferably once a month or two months. A clean car drives much better!