By Ned Bellavance, Director of Cloud Solutions for Anexinet
If I am being frank, as enterprise IT has expanded outside the walls of the data center, disaster recovery plans haven’t adapted for the most part. Overwhelmingly there is still a very traditional, bricks and mortar approach to DR. And that approach is deeply rooted in technology and not in the business. We talk to clients that have a storage array with site-to-site replication and that is their DR plan. While that might be a component of it, there’s a lot more to proper DR planning, including people and process.
The first step for building a DR plan for enterprise IT architecture is to understand which workloads and applications are most critical to your business. Most people on the business side will be able to tell you quickly what is the most important application that helps them drive revenue. Likewise, the helpdesk team would be more than happy to let you know which application users will start calling about if it’s down for even a moment.
Once you know which applications are most important to the company, then you can take a look at the supporting systems that are required for that application to function properly. Things like Active Directory, DNS, and database systems all come to mind. Those systems need to be included in any comprehensive DR plan.
Now the question becomes one of technology, process, and people.
From the technology side, you need to answer questions like:
- Where will the application be recovered?
- How fast must it be recovered?
- How much data can be lost?
For the process you need to know things like:
- How is a disaster officially declared?
- Who needs to be notified?
- How do you execute the plan and validate that it was successful?
And finally you need roles assigned to the proper people with clearly defined responsibilities. A true disaster is a stressful time, and the people need to know what is expected of them and how to do it.
You can handle DR using a single unified logical layer, with many physical DR tools supporting it. When it comes to technology and disaster recovery, there is the software solution that gets the data where it needs to be for recovery. That could be replication that is array based, virtualization based replication, agent based or specific to a particular application, like database replication in Oracle or Microsoft SQL.
Above the replication layer is an orchestration layer. That is where you can work to tie together all the disparate replication technologies into a cohesive plan for recovery. Your orchestration layer could be something as simple as a library of scripts that fire off in order, or a COTS application like Azure Site Recovery or VMware Site Recovery Manager, that can orchestrate the majority of the recovery process for you.
A fully comprehensive DR plan is a beast to manage and keep up to date. Generally by the time a plan is ready to go, the Production environment is already different. How can you keep up with the pace of change in Production? The answer we have found is automation. If you approach building out your Production environment with automation, then the same automation can be leveraged to update your disaster recovery environment as well. Especially with the rise of the public cloud and infrastructure as code, it’s never been easier to deploy environments programmatically. If you are constantly using templates and automation to deploy Production, then DR is updated as a natural by-product.
Embracing automation and leveraging the public cloud for DR has several cost benefits. The biggest is that you no longer need to buy and maintain the hardware to run your DR environment. Most companies have to maintain excess capacity in a managed data center or a secondary site. The cost of that hardware, the space it resides in, and the ongoing maintenance to administer it is just massive. Public cloud allows companies to provision a full DR environment on-demand, pay for it while they need it, and remove it when they are done. There’s also the benefit of being able to do DR tests more often in an automated manner. You no longer have to send a team of people to an isolated site so that they can perform DR exercises for several days. We’ve seen companies go from testing once a year for a week to testing monthly for an afternoon. That is a massive reduction in lost productivity from their staff.
If you are just embarking on a hybrid DR strategy, I recommend starting with your most important application and the supporting systems behind it. Work to plan out the recovery process following the automated approach I outlined before. Once that application can successfully and reliably be recovered, then move on to the next few applications. The more applications you do, the more seasoned you will become in the process, but it’s critical to start small and use an iterative approach to improve and expand the plan. We like to think of this as a pragmatic approach to disaster recovery, and it is an approach we use heavily when developing DR strategy for our clients.
About the Author:
Ned Bellavance is an IT professional with over 15 years of experience in the industry. Starting as a humble helpdesk operator, Ned has worked up through the ranks of systems administration and infrastructure architecture, and in the process developed an expansive understanding of IT infrastructure and the applications it supports. Currently, Ned works as the Director of Cloud Solutions for Anexinet in the Philadelphia metropolitan area, specializing in Enterprise Architecture both on-premise and in the cloud. Ned holds a number of industry certifications from Microsoft, VMware, Citrix, and Cisco. He also has a B.S. in Computer Science and an MBA with an Information Technology concentration. In addition, Ned is a Microsoft Most Valuable Professional (MVP), one of just 67 in the US. He is passionate about new technology and is always trying to separate the marketing fluff from the reality of the technical fine print. You can find his thoughts regarding the technical landscape on NedintheCloud.com. Ned also produces two podcasts, Buffer Overflow and AnexiPod, for his current employer. On the AnexiPod, he interviews subject matter experts about new trends and developments in the fields of App/Dev, Big Data, Infrastructure, and anything else that is shiny. On Buffer Overflow Ned is joined by cohost Chris Hayner and guests to discuss weekly tech news an insightful and occasionally amusing manner. In addition to his day job, Ned also authors courses for Pluralsight.