Data Quality

Solving Data Quality Problems Is Not (Only) Programmers’ Responsibility

June 12, 2020

Christoper Tozzi

Most software is of little use without data to feed into it. When the data is bad, the software performs poorly. Whose job is it to make sure that the data that applications use is of high quality? If you think solving data quality problems is a burden on programmers alone, think again.

It may be tempting to assume that developers bear primary responsibility for ensuring that the software they write works properly no matter which data is fed into it. After all, since they write the code, they alone have the power to control how an application will respond when it receives low quality data.

In fact, however, the responsibility for ensuring that software works properly no matter which data is fed into it is not the job of programmers alone. Everyone in the organization should play a role in ensuring data quality, because the ability of programmers to address this issue is quite limited.

Let’s explore this topic in a bit more detail.

Applications and data quality

Data quality can make or break applications. If an application receives data in an incorrect format, the information that an application tries to retrieve from a database is incomplete or another type of data quality issue occurs, the application often won’t be able to do its job.

Consider, for example, a website that looks up credential information in a database in order to authenticate users. If there are duplicate entries for the same username, the application might not let the user log in at all. Or maybe it will default to using the first entry to authenticate the user, which may or may not work. Either way, the application’s performance will be erratic and unpredictable at best.

A well-written application will include logic to handle data quality problems. In the example from the preceding paragraph, the application will ideally be “smart” enough to check whether duplicate entries exist in the database for the same username and react in an intelligent way in the event that a duplicate occurs. In that event, it might require the user to reset his or her account information, for example.

But the fact is that not all applications are this “smart.” If a data quality problem occurs that the application was not designed to anticipate and handle, something random might happen. It could spew cryptic error messages that confuse users. It may continually restart itself, only to have the data quality problem recur each time. It might freeze and stop responding entirely.

In any case, unless the application was designed to handle a specific data quality problem, something bad will probably happen whenever that data quality issue occurs.

eBook

4 Ways to Measure Data Quality

There are lots of good strategies that you can use to improve the quality of your data and build data best practices into your company’s DNA. In in eBook we review the data and metrics that you can use to measure the effectiveness of your data quality improvement efforts.

Read

Programmers’ data quality responsibilities

In a perfect world, programmers would be able to see into the future and anticipate all possible scenarios in which data quality problems could disrupt the ability of the software they write to operate properly. They would also have the time and skills to include code within the applications that can address those problems.

In the real world, of course, the amount of time and effort that programmers devote to handling potential data quality problems in their software is quite limited. They might — and should — include code to perform basic data validation, which ensures that data input into an application is complete, formatted as expected and so on. They might also take steps to validate data input for security reasons, in order to prevent “injection” attacks and the like.

Yet even the best programmers can’t foresee every type of data quality issue that could occur within their applications. And most of them don’t have the time to write code for handling those issues anyway. Plus, even if they did, their applications might end up being quite bloated by functions that handle obscure data quality issues.

So, while programmers should ensure that their applications perform basic data validation and data security checks, it is hardly realistic to expect programmers to address every potential data quality issue that could impact their applications in solving data quality problems.

It is everyone’s job in solving data quality problems

This is another reason why data quality is the responsibility of everyone within your organization. Not only data engineers but any employee who interacts with data has a part to play in ensuring that the information that powers your business is free of data quality problems.

The fact is that no single individual or group can totally prevent data quality errors. Your data governance strategy can include steps to mitigate data quality errors, but it won’t be able to prevent errors entirely. Your data engineers can run tools to check for data quality problems within existing databases, but they will almost certainly overlook some issues. And your programmers can design applications to respond intelligently to data quality problems, but again, they can’t solve every type of data quality issue that their applications might encounter.

By making data quality everyone’s job, you maximize your organization’s ability in solving data quality problems issues before they impact business productivity.

Perfect data quality is impossible in most cases, but when everyone takes responsibility for helping to ensure data quality, you can come close to perfection. Download our eBook today to learn how you can measure the effectiveness of your data quality improvement efforts.