Validation vs. Verification: What’s the Difference?
|Data validation||Data verification|
|Purpose||Check whether data falls within the acceptable range of values||Check data to ensure it’s accurate and consistent|
|Usually performed||When data is created or updated||When data is migrated or merged|
|Example||Checking whether user-entered ZIP code can be found||Checking that all ZIP codes in dataset are in ZIP+4 format|
In layman’s terms, data verification and data validation may sound like they are the same thing. When you delve into the intricacies of data quality, however, these two important pieces of the puzzle are distinctly different. Knowing the distinction can help you to better understand the bigger picture of data quality.
What is data validation?
In a nutshell, data validation is the process of determining whether a particular piece of information falls within the acceptable range of values for a given field.
In the United States, for example, every street address should include a distinct field for the state. Certain values such as NH, ND, AK, and TX conform to the list of state abbreviations as defined by the U.S. Postal Service. As you know, those abbreviations denote specific states.
There are also two-character abbreviations for U.S. territories, such as Guam (“GU”) and the Northern Mariana Islands (“MP”). If you were to enter “ZP” or “A7” in the state field, you would in essence be invalidating the entire address, because no such state or territory exists. Data validation would perform a check against existing values in a database to ensure that they fall within valid parameters.
For a list of addresses that includes countries outside the U.S., the state/province/territory field would need to be validated against a significantly longer list of possible values, but the basic premise is the same; the values entered must fit within a list or range of acceptable values. (FYI, Precisely offers address validation solutions)
For instance, in some cases you might need to set limits around possible numeric values for a given field, albeit with a bit less precision than in the previous example. If you are recording a person’s height, you might want to prohibit values that fall outside the expected range. If a person is listed in your database as being 12 feet tall (about 3 meters), then you can probably assume the data is incorrect. Likewise, you would not want to allow negative numbers for that field.
Fortunately, these kinds of validation checks are typically performed at the application level or the database level. For example, if you’re entering a U.S.-based shipping address into an e-commerce website, it’s unlikely that you would be able to enter a state code that is invalid for the United States.
Read our eBook
Explore key data quality insights from data professionals in the data quality survey
What is data verification, and how is it different?
Data verification, on the other hand, is actually quite different from data validation. Verification performs a check of the current data to ensure that it is accurate, consistent, and reflects its intended purpose.
Verification may also happen at any time. In other words, verification may take place as part of a recurring data quality process, whereas validation typically occurs when a record is initially created or updated.
Verification plays an especially critical role when data is migrated or merged from outside data sources. Consider the case of a company that has just acquired a small competitor. They have decided to merge the acquired competitor’s customer data into their own billing system. As part of the migration process, it is important to verify that records came over properly from the source system.
Small errors in preparing data for migration can sometimes result in big problems. If a key field in the customer master record is assigned incorrectly (for example, if a range of cells in a spreadsheet was inadvertently shifted up or down when the data was being prepared), it could result in shipping addresses or outstanding invoices being assigned to the wrong customer.
Therefore, it’s important to verify the information in the destination system matches the information from the source system. This can be done by sampling data from both the source and destination systems to manually verify accuracy, or it can involve automated processes that perform full verification of the imported data, matching all of the records and flagging exceptions.
Verification as an ongoing process
Verification is not limited to data migration. It also plays an important role in ensuring the accuracy and consistency of corporate data over time.
Imagine that you have an existing database of consumers that have purchased your product, and you want to mail them a promotion of a new accessory to that product. Some of that customer information might be out of date, so it is worthwhile to verify the data in advance of your mailing.
By checking customer addresses against a change of address database from the postal service, you can identify customer records with outdated addresses. In many cases, you can even update the customer information as part of that process.
Identifying duplicate records is another important data verification activity. If your customer database lists the same customer three or four times, then you are likely to send them duplicate mailings. This not only costs you more money, it also results in a negative customer experience.
To make the deduplication process more challenging, multiple records for the same customer might have been created using slightly different variations on a person’s name. Tools that use fuzzy logic to identify possible and likely matches can make the process work better.
The data quality mandate
More and more business leaders are coming to understand the strategic value of data in the insights that can be extracted from it using artificial intelligence/machine learning and modern business intelligence tools.
Unfortunately, however, the old saying “garbage in, garbage out” applies now more than ever. As the volume of data increases, it’s essential that data-driven companies put proactive measures in place to monitor and manage data quality on a routine basis. Otherwise, they risk acting on insights that are based on flawed information.
To learn more, read our eBook: How “Good Enough” Quality is Eroding Trust in Your Data Insights