Data Quality Dimensions
Good quality data has great value. It helps you make the right decisions and enables the success of your business. But most companies don’t have good quality data. Data received from various sources is unstructured and needs cleaning on multiple parameters. To understand the data, it is measured against a set of criteria called ‘data quality dimensions’ that assess the health of the data. These dimensions are:
- Data Completeness
This dimension measures if all the necessary data is there in the dataset and indicates whether there is enough information to draw insights and conclusions from the data. An example of a data quality metric to measure completeness is the number of ‘missing values’
- Data Consistency
This dimension measures if two data values, derived from different sets, are conflicting with each other or not. This becomes a concern especially when data is aggregated from multiple sources
It measures the ‘schedule vs. receipt’ time. This dimension helps you keep track of when the data was received and the next expected date for receipt of data. It also measures the moment you can use the received data. A typical metric to measure timeliness is data ‘time-to-value’.
- Data Integrity
This dimension measures the accuracy of the data. Data travelling between multiple systems and databases may negatively affect integrity. The goal of this dimension is to make sure that no data errors occur. Some of the interventions to ensure data integrity are mapping parameters and generating Unique ID (UID).
This dimension measures if the data complies with the required value attributes. It also checks if all the fields are in the proper format. For example, it checks if the day, month, and year numbers are in the same format.
A good data management tool, like a data pipeline, can help organizations achieve these quality dimensions in a swift and effective manner.