07/08/2019
I know, I know. An article on disaster recovery and how important it is. *yawn* The necessity of DR is a foregone conclusion. I thought that, too. However, I am amazed at how often I still see companies without a defined disaster recovery plan, much less the technical readiness.
The things that keep companies from establishing disaster recovery capabilities progress through the standard gamut of excuses, including but not limited to
- Cost constraints
- Resource constraints
- Low probability of risk
Often, the reason is that companies have grown rapidly, and that same entrepreneurial spirit that keeps the business moving forward does not want to take on a project that focuses on risk reduction. The drive is to focus instead on projects that improve operational excellence and business differentiation.
But what happens to that push to business excellence if disaster indeed strikes? What is the cost per hour if manufacturing stops, if customer orders cannot be received, if field activities halt, if customers cannot sign up for services? Disasters may have a low probability, but they can have catastrophic impacts.
Disasters
Now, what constitutes a “disaster”?
We usually think of the data center engulfed by fire or wiped out by a tornado. In reality, you may lose connectivity to your data and applications for significant time via someone simply hitting a nearby power pole. I’ve seen multiple companies lately with critical applications housed in office server rooms (aka closets) with supplemental cooling. No kidding, one had a window unit AC! If the cooling unit fails, the equipment in that room will overheat and shut down. Other issues may be system patches that corrupt databases or ransomware that locks file systems. These scenarios are scary, but they can often be overcome with a valid DR plan in place.
Create your plan
So, where do you start?
- Tier your applications according to business criticality, and document who does what and in what sequence.
- Plan what hardware you need and where you will put it if your plan is to purchase new hardware and restore from tape (yes, purchase and replace is a valid plan if those applications can be offline for an extended time).
- Determine your Recovery Time Objectives (RTO), how long you can afford the application to be down, and your Recovery Point Objectives (RPO), how much data you can afford to lose.
With many companies taking advantage of Cloud technology, DR planning has become much more streamlined. However, you must be intentional about your disaster recovery planning – even if you utilize the Cloud. Just because you are in the Cloud does not make you bulletproof, which can be a hard lesson for folks to learn. Even if you migrate to the Cloud, you still need to replicate data and virtual machines, which is typically an additional cost. However, most Cloud providers can configure and manage that replication, and they will often have SLAs that guarantee your uptime.
See if it works
Once you have your plan in place, be sure to test it.
- Coordinate with your remote data centers, cloud providers, and most importantly, your business stakeholders.
- Validate that data replication is working and that recent production data is present in the DR environment in accordance with your RPO.
- Make sure that all applications and application interfaces are working properly. (Hint: This is where you find hidden hard-coded IP addresses for interfaces. Yikes!)
- Shut down connectivity to primary instances and fire up the DR environments.
I get it – it’s a lot of work to prepare for disaster recovery, and it can carry a sizeable cost. Once you factor in the cost of downtime and losing data, however, the cost and time to set up DR is justifiable. Addressing that foregone conclusion will ultimately make you sleep better at night.