Looking For a Data Catalog?
There’s a lot of buzz around data cataloging tools right now — and a growing number of solutions from more and more vendors. What exactly is a data catalog? And how do you make sure you are not getting lost in the process of selecting the right catalog to meet your needs?
Don’t get lost. Here’s your guide to getting started.
Read this eBook to learn the basics of what a catalog is and how it works, what business challenges it can help solve, and how to make sure you are avoiding common pitfalls and choosing the right one for your needs.
What is it and how does it work?
In a nutshell, a data catalog is a place that shows what data assets (i.e. reports, databases, websites which contain data) you have and where they are located.
How does a data catalog work and how does it help organizations get a handle on their data and more importantly, use it to make decisions and drive business value? Pictured below is a simple graph that illustrates how a data catalog solution can work to deliver business outcomes.
How does an optimal data catalog work?
Here are the 5 stages showing how a data catalog can deliver on the business outcome: “I want to delight my customer.”
Data governance and stewardship
Across all stages data definitions must be set up based on rules and standards to find the available data throughout the enterprise, know where it is, and ensure it is trustworthy. Finding your data is one step to the process. connecting it to business outcomes provides the full solution.
Do I need a data catalog?
With the tremendous growth in the volume of data, increased access to multiple data sources, along with new compliance regulations, organizations are working to “get a handle” on their enterprise- wide data. To do so they must be able to answer the questions:
- What data do I have?
- Where is it?
- Is it trusted?
As a result, data catalog solutions have gone from being a “nice to have” to a “must have” in the arsenal of data governance capabilities. In the recent research report Data Catalogs are the New Black in Data Management and Analytics Research, Gartner reports that demand for data catalogs is soaring as organizations struggle to inventory their distributed data assets to facilitate data monetization and conform to regulations.
How do you know if you need a data catalog?
If you find yourself saying the following, you may need a data catalog (or data catalog + governance) solution:
“I need better analytics!”
Many organizations are asking how to gain more value from analytics and have better visibility into their data. The introduction of IoT and digital transformation have resulted in an abundance of data. Now organizations need to find the available data and confirm it’s trusted so it can be used for decision-making.
“I’ve invested in B.I., but is the reporting data correct?”
There has been a surge in the investment in B.I. software. Locating the right data for analysis and reporting is a challenge that must be solved when implementing B.I. While some organizations are able to locate their data, they cannot identify the source to confirm it’s valid. Still others are finding conflicting results between two different reports.
“My data lake has become a data swamp.”
Your data lake seemed to be the answer to all of your problems. But now, business stakeholders are unable to access the information they need from the data lake. No one is certain what data exists in the lake or how to access it.
“How do I prepare my organization for A.I.?”
As A.I. moves into the mainstream, organizations are finding that identifying the right data to inform the algorithm is critical. This applies to the input data along with the features of the data itself, including tagging the data, having the right metadata, user data, etc. The first step in this process, then, must be to discover and catalog the data.
In all of these cases, there is a common thread. Organizations must be able to answer, “What data do we have and where is it?” But they don’t only need to “find” their data, they also need to understand how it connects to their enterprise metadata, and more importantly, to their business outcomes.
As organizations start to flock to the most popular solutions, they should take heed of Gartner’s advice, which cautions that organizations take the time to find the “right” solution and make sure that it can be aligned with organizational initiatives. As stated in the recent Gartner research: “Data catalog projects will fall short of their full potential if data and analytics leaders don’t link them to broader data management needs.” See Pitfall #2
“Data catalog solutions have gone from a 'nice to have' to 'must have' in the arsenal of data governance capabilities.”
What is the typical implementation timeline and how do I avoid pitfalls?
Data catalogs should be easily implemented within a few weeks to months. However, there are a few reasons why companies might experience more painful, less timely projects. If you have done your due diligence and selected a data catalog that is cloud-based, “on the stack” and aligned with your Enterprise Information Management(EIM) system and enterprise metadata management strategies, then it should be smooth sailing. However, if you have decided on a catalog that requires up-front customization, specific hardware or a team of specialized developers then you might be looking at a costly project.
Pitfall 1: Don’t take a vendor’s word for it
Vendors want to sell their solution. So sometimes weakness and limitations are glossed over. It is your job to make sure that you aren’t falling for “market-tecture.” When deciding on a catalog, check popular review sites like Gartner Peer Insights, speak with analysts and make sure you ask references about implementation.
Pitfall 2: Don’t be shortsighted
According to Gartner, companies should “Avoid data catalogs that do not have the ability to scale out beyond tactical use case requirements and connect to the broader enterprise metadata management and EIM initiatives.” Some companies are choosing data catalogs based on a single, tactical use case, such as inventorying the data in their data lakes. It’s important to understand that deploying a catalog for one tool or use will improve data usability, trust and shareability ONLY for that specific tool. This ultimately creates the need for a data catalog of all the data catalogs in your architecture. This is not the way to enable effective monetization in the long term. Before selecting a data catalog for one specific use case, make sure that you have evaluated options that span across use cases and are connected to your broader EIM needs.
Pitfall 3: Don’t assume that every catalog is usable by everybody
Some catalogs are built for a more technically minded user who is using SQL. These catalogs have some high- tech capabilities and provide a full picture into the technical lineage and providence of every bit of data in the ecosystem. Others are built more for business users that don’t care about SQL or about technical lineage, but are interested in the data that matters for the initiative they care about in a user-friendly way. Who is going to be using your catalog and for what reason? Make sure that you don’t try to force your business users into being IT coding experts. This could cause serious issues with adoption and ROI.
How do I choose the best data catalog?
It’s essential to spend the time up front to identify what functionality is important to your organization. You might find that different groups have different needs. Having this list defined when you start your search will help ensure you’re selecting the right solution. At a bare minimum, data catalogs should be able to:
- Discover what data is available
- Identify where it is located
- Provide information on whether that data can be trusted
Once you’ve checked the box on that basic functionality, there are several other considerations to ensure your catalog can be used to add business value in the future:
- Will it provide real-time integration with your data sources so that they are continuously populating the data catalog with the data that is critical to you?
- Is it easy to use?
- Can it search all your databases, on-premise or in the cloud?
- Will you be able to connect your data assets directly to organizational goals and initiatives so that you can see and measure how data drives your business?