How to Improve Data Quality: 3 Real-World Examples
You hear a lot about data quality these days. But much of the discussion focuses on data quality at a high level, without much attention to what data quality looks like in a real-world context. This article aims to cut against that grain. Below, we take a look at three realistic examples of data quality issues that you might face in an everyday business environment. We also explain how to address them to improve data quality.
Inaccurate address data
Addresses are a crucial data point for many businesses. They are important for marketing purposes, for gaining insight into customers’ needs and wants, and sometimes simply for delivering products and services themselves.
Yet addresses are also a data point that can easily become a source of low data quality. That is true for several reasons.
First, address information is often input by hand — either by your employees who collect it from customers over the phone or on-site, or by customers themselves, who enter it into forms or websites. (In some cases, your customers might fill out handwritten forms with their address, and then your employees enter the information into a database later.)
Whenever you do something manually, you run a pretty good risk of introducing errors. For example, if an employee hits the A key instead of the I key when entering an address in Washington, IL, you might end up sending marketing material to an address in the ghost town of Washington, AL instead of the livelier town of Washington, IL.
It can also be hard to maintain high-quality address data because address formats vary so widely. If you were like me, you were taught in school to use two-letter state abbreviations when writing out an address, and to be sure always to include a zip code. But that doesn’t mean everyone adheres to these rules, and as a result, how address data is entered can vary widely, leading to inconsistency.
To an extent, you can mitigate these risks by designing address entry processes that minimize the chance of error. For example, you could have software require your customers or employees to enter address data in a particular format.
But you can’t prevent all errors at the time of data entry. That’s why it is also important to run data-quality checks after data has been entered. For instance, to go back to the example above involving towns named Washington in different states, you could use automated tools to check to see whether a given street address actually exists in the town of Washington, Alabama. If it doesn’t exist there, but does exist in Washington, Illinois, you’d know that you likely have a data entry problem, which you can fix easily enough.
Assessing data quality on an ongoing basis is necessary to know how well the organization is doing at maximizing data quality. There are a variety of data and metrics that organizations can use to measure data quality. We’ll review a few of them here.
Incomplete phone numbers
Alongside addresses, phone numbers are another critical type of data for many businesses. And like addresses, phone numbers can easily be recorded in ways that make them difficult to work with.
Perhaps the most common data quality problem that arises with phone numbers is a lack of completeness. Ask someone in the United States to enter a phone number, and you might get anything from a basic seven-digit local number, to a ten-digit number that includes an area code, to a possibly much longer number that includes country code as well.
Depending on how much detail you need in a phone number, a lack of completeness could be a problem. If all of your customers are local, you may not need more than a seven-digit number. But if you operate internationally, having numbers that are as complete as possible is likely to be important.
Although you can’t do much automatically to complete an incomplete phone number after the fact, you can at least use data-quality tools to identify incomplete numbers within a database, then correct them manually. Doing so routinely will ensure that you have the phone number data you need to reach customers, whenever you need it.
Missing data entries
In a perfect world, end-users would always fill out online forms completely. In the real world, however, users may miss certain data entry fields, or be unsure what to enter into them. And while you can design digital forms in such a way that users cannot continue until all fields are filled out, doing so may leave you with angry users, because no one likes having to scan through long forms trying to figure out what he or she has missed or overlooked.
Because of all of this, it is common to end up with customer-entered data that is incomplete or contains missing fields. A customer might, for example, forget to include a zip code when entering an address, or refuse to enter an email address on an online form for privacy reasons.
While you can’t (typically) force users to enter data that they don’t want to enter, you can at least use data-quality tools to identify fields within a database that are missing, or appear likely to be inaccurate. Depending on which types of data are missing, you can then potentially complete it using other data sources (it’s usually easy enough to fill in a missing zip code, for instance), or mark the entry as a whole as incomplete and therefore unusable.
Check out our eBook on 4 ways to measure data quality.