Blog > Big Data > Best Practices in Data Storage: What Types of Data Should be Retained?

Best Practices in Data Storage: What Types of Data Should be Retained?

Authors Photo Christopher Tozzi | July 17, 2020

Even if you analyze your data in real time, storing data for extended periods is important for compliance and other reasons. But what types of data should be retained and how long should you keep it? Keep reading for some insights on data storage.

Data analytics requires data storage

These days, real-time data analytics should be the foundation of most organizations’ approach to working with data. But that doesn’t mean that you should interpret data as it streams in, then delete it forever.

On the contrary, keeping data around for a while – even after you’ve interpreted it – is important. It helps keep you compliant by ensuring that data remains available for audits or other reviews. It also provides you an opportunity to review historical data to identify long-term trends, or investigate incidents that you may not discover until long after the data related to them has been generated and processed.

Read our eBook

Streaming Legacy Data for Real-Time Insights

Learn about the challenges to streaming legacy data. And, see how Precisely can help your business stream real-time application data from legacy systems, such as mainframes, to mission critical business applications and analytics platforms that demand the most up-to-date information for accurate insights.

Types of data to retain

The first step in building an effective data storage policy is to answer the question: Which types of data should I store for an extended period, and which can I delete instantly?

The short answer is that, to the extent possible, you should retain as much data as your storage capacity can support.

But since most organizations must prioritize some data types for long-term data storage, here’s a general hierarchy that outlines which types of data to keep on hand. The data at the top of the list is the most important to store for as long as possible, while the data at the bottom is least important:

  1. Data that is required to be retained by compliance or regulatory policies. If you’re required by law to store a certain type of data, you should definitely keep that data around.
  2. Data that relates to your customers and helps you engage with them by achieving customer 360. Understanding your customers is hard, and you don’t want to give up the data that helps you with that challenge.
  3. Business documents, contracts and so on. This is important to store for as long as possible.
  4. Data that is generated by everyday business operations but is not regulated. This data can be helpful to have on hand for historical reviews or planning purposes, but it’s not essential.
  5. Machine data generated by your networking equipment, servers, sensors or other types of automated sources. Machine data tends to be the least useful type of data to store long term. It is sometimes useful to be able to review machine data when researching a technical incident or planning infrastructure expansions, but for the most part, machine data is only useful in real time, because the state of your infrastructure changes so quickly.

The exact types of data to prioritize for long-term storage will vary from organization to organization, of course. This hierarchy is just a general guide.

Check out our eBook: Streaming Legacy Data for Real-Time Insights