Why Disaster Recovery is Important Now

Disaster Recovery (DR) is more important than ever for today’s organisations. You may view the disasters that cause IT outages as unlikely, but the law of numbers suggests that most companies will suffer significant downtime at some point. Roughly 80% of those surveyed by Gartner reported an incident during the past two years that required an IT DR plan.

It’s become essential to satisfy growing regulatory pressures, but it’s also an invaluable tool to support business continuity in the event of a disaster including outages, ransomware attacks and even user error. The experts at leading technology services provider, Probrand, have compiled a complete guide to disaster recovery.

What is disaster recovery?

Disaster Recovery is a means of minimising downtime following a catastrophic in-house event such as hardware failure or file corruption. DR enables a cloud-based copy of mission critical files and systems that can be accessed in the event of on-site IT issues such as a server failure.

This allows business output to continue via the cloud, meaning work may proceed as usual throughout the duration of the repairs to the user’s on-site network.

Why is disaster recovery important?

Physical media is often slow and can be unreliable as it deteriorates over time. The only sure way to prove that physical backups work is to test them, but this doesn’t happen enough. 38% of respondents to Forrester’s 2017 State of Disaster Recovery Preparedness survey tested less than once a year, if at all.

The other problem with backing up to both physical media and remote, customer-owned facilities is storage capacity. The Enterprise Storage Forum surveyed 374 IT and business professionals in 2018, and found that almost 50% of them had seen data storage grow between one and 99 TB in the last two years.

Around 30% had grown anywhere between 100 TB to over 10 PB. Projected data storage growth in the next two years looked similar. Backing up that data becomes increasingly difficult for companies with roll-your-own DR because it involves buying more equipment to cope.

How does disaster recovery work?

Employing cloud-based offsite disaster recovery specialists is the preferred method of business continuity chosen by medium and large sized businesses that place a high value on keeping downtime to a minimum. Outsourcing in this way means that the service user does not need to employ permanent on-site technical staff and does not need to allocate appropriate space to house the necessary IT infrastructure.

The agreement between the provider and the user will typically highlight the permitted number of daily uploads, along with expected recovery times and the order in which files and systems will be made available to the user. For example, access to email, client files, and essential systems are normally regarded as high priority.

Alternatively, a company may wish to host its off-site DR solution in what is known as a Colocation or ‘colo’. This is where certain aspects of physically storing data are catered for (such as space, power, heating), but members of technical staff are not provided. This means that when a disaster occurs, the service user must source and fund the necessary specialists to perform the duties associated with DR.

What is DRaaS?

Disaster recovery as a Service (DRaaS) uses cloud infrastructure to offer businesses an alternative to traditional DR approaches.

DRaaS eliminates the need for physical backups by storing data on cloud-based media that can itself be replicated to ensure data integrity. Testing those backups is relatively easy, as customers can take cloud-based virtual machines and data live at any time.

Using a mixture of local appliances and online virtualized backups, hybrid DRaaS shortens the recovery time objective to just 15 minutes in some cases. Businesses can recover their mission-critical data directly from the local appliance quickly and easily. In more serious failures, they can simply switch to the cloud and activate their virtualized applications, using data with a short recovery point objective, minimizing their downtime and losses.

How do I implement disaster recovery?

A modern Disaster Recovery plan needs to consider a variety of factors – Where are your applications? Where’s your data? What are the key priorities?

It’s crucial to prepare a robust plan that incorporates solutions for where your people will be, and how they’ll gain access to your Disaster Recovery environment.

Start with a risk assessment

In the risk assessment phase, identify your critical systems and measure their importance to your key business processes. Document the threats facing those systems and log the impact to those processes if the threats materialise. This assessment may turn up unexpected weaknesses that you can plan against.

Define your objectives

Armed with this information, you can calculate your disaster recovery objectives. There are two main metrics to consider: the recovery time objective (RTO), and the recovery point objective (RPO).

The RTO defines how quickly you want to get operations up and running again. Some critical functions may need an instantaneous failover. For non-critical systems, a few hours might be sufficient.

The RPO defines the point from which you want to restore your data, and therefore how much of it you are willing to lose during a disaster. An RPO of one hour means that you could restore your data to its state 60 minutes before the disaster, for example.

Determine your response and recovery strategies

A disaster recovery plan extends beyond just your data. It includes the people that use that data, the facilities they work in, the equipment they use, and the processes they follow. A disaster may affect your suppliers and your customers. Your plan should include all of these elements and define how they interact when a disaster hits.

With this in mind, define a disaster recovery team responsible for handling tasks such as situation assessments, choosing which plan to implement, and managing different strands of that plan including things like site relocation and data recovery.

Data backup and restoration

From a data perspective, your disaster recovery plan will include a backup and restore strategy. Options here include real-time data replication or periodic ‘grandfather-father-son’ backups that keep multiple copies of your data for different time periods.

You can manage your own backups between different sites, or take advantage of disaster recovery as a service (DRaaS), which uses cloud infrastructure to manage your backup and restoration.

Finally, make sure that you test your disaster recovery plan regularly. You don’t need to simulate a company-wide blackout every month; test small parts of your disaster recovery strategy to minimise disruption while ensuring that each works smoothly.

Map your assets

Begin by understanding what you need to preserve and restore in the event of a disaster. This will cover your files and applications, whether they’re running on-premises or in the cloud, but it will also include other aspects of your business. Consider your communications infrastructure and your facilities. How will people talk to each other, and where will they work?

Rank their importance

Evaluate every asset on your list to determine how important it is to the ongoing health of your business. Hopefully, everything will have some importance (and if it doesn’t, that’s a great opportunity to rethink it).

Some systems will be more critical than others, such as production systems vs software development servers. Consider putting them into three tiers to reflect high, medium, or low importance. This should give you an idea of your risk level and therefore how to fold them into a recovery plan.

Identify third-party dependencies

Modern businesses are increasingly dependent on each other, and you will undoubtedly rely on several vendors and service providers for critical elements of your business. Identify these services and assess their importance, too. Contact the vendors and ask them about their recovery plans. How will they protect those services in the event of a disaster, and what can they do to bring them back online if they go down?

Define your recovery objectives

This is where you set out the goals for your disaster recovery plan. Consider your recovery time objectives (how long it will take to bring a service back online) and your recovery point objectives (how recent the point is that you want to restore).

It might be fine to back up some systems, such as contact management databases, every week or two. Others, such as product ordering systems, might need backups every day or even every hour. Define these parameters for each application and its data so that you can build backup and recovery processes to restore them.

Plan your setup

Create the technical infrastructure that you’ll use to back up your data. This includes not only the software to back up your systems, but also the backup frequency and the location and media that those files are stored on. Consider versioning options when creating your backup scheme. A single backup file will be no good if you back up corrupt data to it, so it’s best to create multiple versions that you can restore independently.

Progressive backups are the basis for most disaster recovery as a service (DRaaS) offerings, which store data in a cloud-based data centre. These enable companies to back up data off-site for security and resilience without having to handle physical media, and they also make it easy to configure backup options and restore data from a single web interface.

Create your team

People are an important part of the disaster recovery process. Ensure that you have a list of personnel (and contingencies, where possible) who are responsible for getting the necessary assets back up and running. This includes internal IT people but also vendors, service providers, and facilities managers. Everyone should be able to respond quickly to a disaster by executing the recovery playbook.

Test your plan

A disaster recovery plan is only useful when it’s reliable. If you haven’t tested the plan and can’t be sure that it works, your company could suffer. As your IT infrastructure grows and changes over time, it might outgrow the capabilities of your original disaster recovery solution.

Build regular test drills into your schedule to be sure that things still operate as planned. This will be far easier with DRaaS systems that will allow you to check the health of replication jobs on the fly and automate the recovery process for you.

The true benefit of outsourcing DR to a third party host is the guarantee that in the event of a catastrophic event such as server failure or file corruption, expert help is instantly available. Staff at your third party host will talk you through the recovery process. Downtime is therefore minimised, and your company does not have to employ full-time on-site IT technicians.