Blog > Big Data > Big Data Disaster Recovery Preparation Tips

Big Data Disaster Recovery Preparation Tips

Authors Photo Christopher Tozzi | January 31, 2020

Preparing to recover big data workloads after an unexpected disaster requires more than just having data backups on hand. This article explains how to build an effective big data disaster recovery strategy.

Disaster recovery is the process of restoring normal operations after an unexpected event destroys part or all of your IT infrastructure.

All organizations should have a disaster recovery plan in place. However, the importance of disaster recovery is even greater for companies that rely heavily on data to drive their business, and that need to restore data-based operations quickly in order to get back to business following a disaster.

5 Must-haves for big data disaster recovery

An effective big data disaster recovery plan includes the following…

1. Off-site data backups

Backing up your data to a remote location is the most obvious disaster recovery preparation step. Off-site backups ensure that data will remain unharmed in the event that a physical disaster, such as a fire or a major storm, destroys your production infrastructure.

Perhaps the most important item to keep in mind about off-site data backups is that they are not enough on their own to ensure reliable disaster recovery. See also: Data Backup vs. Disaster Recovery: Yes, There’s a Big Difference

2. On-site backups

Off-site data backups are the best way to ensure that data will remain available, no matter what type of disaster may strike.

In some cases, however, it may make sense also to keep on-site backups. The advantage of on-site data backups is that data can often be restored more quickly from on-site servers than it can from remote sites – provided, of course, that some of your on-site infrastructure survives the disaster.

Read our white paper

The Ultimate Buyers Guide to HA/DR Solutions

Review every high availability and disaster recovery solution available today for your environment, from single-system to multi-system replication, from logical replication to disk-level replication and all points in between.

3. Big data recovery playbooks

When you’re dealing with an unexpected infrastructure failure, you need a plan in place for guiding all of your actions as you restore data. The last thing you should be doing is figuring things out as you go, or guessing what your next step should be.

This is why developing “playbooks” is so important. A playbook is a set of steps that you write out ahead of time – that is, before a disaster occurs – and follow when recovering from a disaster.

Your playbooks should be written to be somewhat adaptable, of course, because it’s impossible to predict every challenge you’ll face during disaster recovery. But having playbooks in place will do much to lay the groundwork for quick and efficient disaster recovery.

4. Data transformation tools

Moving data from backup locations to production servers can be time-consuming when there is a lot of data involved. It is even more difficult if the data needs to undergo transformations – which is likely the case if, for example, your backup data is stored in one format but needs to be converted to a different format in production.

For this reason, it’s important to ensure that you’ll have good data transformation tools at your disposal during disaster recovery. This may require having backup instances of the tools available in case your production environments are destroyed.

5. Data capture continuity

A disaster may destroy your ability to continue capturing data, but it doesn’t stop the data from flowing. During disaster recovery, it’s important to ensure that you maintain continuous capture of data to the extent possible, even if your analytics operations are interrupted.

If possible, have backup storage locations that will become operable in the event that your main storage servers are disrupted. Ensure that your backup locations have enough capacity to handle the amount of new data that will be generated during the time it takes to restore operations – which could be hours, days or longer, depending on the type of disruption you suffer and the extent of your infrastructure.


Heeding the considerations discussed above will ensure not just that you have backup data available in the event that disaster strikes, but also that you can restore production data operations as quickly as possible. Backup data on its own is of little value if you can’t put it to use quickly in a broader big data disaster recovery strategy.

Make sure your data is protected when disaster strikes. Read our white paper: The Ultimate Buyers Guide to HA/DR Solutions