Does Your Data Measure Up? How to Assess Data Quality
Businesses today are increasingly dependent on an ever-growing flood of information. Whether it is sales records, financial and accounting data, or sensitive customer information, the accuracy and adequacy of a company’s data are critical. If portions of that information are inaccurate or incomplete, the effect on the organization can range from embarrassing to catastrophic.
That’s why you, as an IT professional, should be committed to ensuring that the information your company relies on meets the highest data quality standards.
Measuring data quality: The data quality assessment
The term “data quality” refers to the suitability of data to serve its intended purpose. So, measuring data quality involves performing data quality assessments to determine the degree to which your data adequately supports the business needs of the company.
A data quality assessment is done by measuring particular features of the data to see if they meet defined standards. Each such feature is called a “data quality dimension,” and is rated according to a relevant metric that provides an objective assessment of quality.
The industry hasn’t yet settled on a standard set of data quality dimensions, but the following is a representative group:
Four metrics of data quality
Let’s take a brief look at each of these and at the metrics used in assessing them.
Completeness relates to whether all required information is present in the dataset. For example, if the customer information in a database is required to include both first and last names, any record in which the first name or last name field is not populated is marked as incomplete. The metric used in assessing this dimension is the percentage of records that are complete.
Data is characterized as valid if it matches the rules specified for it. Those rules typically include specifications such as format (number of digits, etc.), allowable types (integer, floating-point, string, etc.), and range (minimum and maximum values). For example, a telephone number field that contains the string ‘1809 Oak Street’ is not valid. The metric for this dimension is the percentage of records in which all values are valid.
To measure data quality and track the effectiveness of data quality improvement efforts you need data. Learn more about the variety of data and metrics that organizations can use to measure data quality.
Timeliness relates to whether the information is up-to-date for the intended use. In other words, is the correct information available when needed? For example, if a customer has notified the company of an address change, but the new address is not in the database at the time billing statements are processed, that entry fails the timeliness test. The metric used to measure timeliness is the time difference between when data is needed and when it is available.
A data item is consistent if all representations of that item across data stores match. If, for example, a birth date is entered in one system using the U.S. mm/dd/yyyy format, but it is imported into another system where the date is entered using the European dd/mm/yyyy standard, that data lacks consistency.
Add data integrity to the mix to complete the picture
When critical linkages between data elements are missing, that data is said to lack integrity. The four pillars of data integrity are data integration, data quality, location intelligence, and data enrichment.
An example of data integrity would be a Sales Transactions table in which the customer ID points to a record in the Customer table. If a customer record is deleted without updating related tables, records in the Sales Transaction table that point to that particular customer become “orphans” because their parent record no longer exists. This represents a loss of referential integrity. An appropriate metric for data integrity would be the number of orphan records present in a database.
How to get started with your data quality assessment
If you’ve never done a data quality assessment before, it can look a bit daunting. But it needn’t be. Sophisticated automated data quality solutions such as those provided by Precisely can make the process straightforward.
Check out our eBook, 4 Ways to Measure Data Quality, to learn more.