Hurricane Sandy which hit New York and the East Coast of USA hard end of October 2012 again proved how important disaster recovery is. Sandy will for sure be a learning opportunity for many.
It was not the wind which caused damage to datacenters. It was the flooding. Datacenters are prepared for failure of the utility power. They all have generator power available which runs on fuel. Most providers have at least 48 hours of fuel onsite with fuel vendors on standby to deliver more as needed.
However because of the flooding in some cases generators were flooded as the were located in the basement. In other cases the water took out diesel fuel pumps used to refuel generators. Also reported were leaks of water into the datacenter.
Datagram, the ISP whose Manhattan servers host BuzzFeed, Huffington Post, Gawker, and other sites lost power.
According to datacenterknowledge.com both Internap and Peer 1 were struggling to continue operations at 75 Broad Street after basement-level flooding disabled critical diesel fuel pumps, leaving the providers now way refuel generators on mezzanine floors. As a result of the flooding, both Internap redundant fuel pumps and their generator fuel tank were compromised and shut down. The system continued to run until all fuel within the secondary feeder tanks were exhausted and Internaps facility lost power.
Someone working for Peer1 at 75 Broad Street facility writes at arstechnica.com that Peer1 called volunteers to come to the datacenter. He describes that a lot of people used buckets to bring diesel fuel from the basement up 17th floors to the generator tank.
In any event, just after I got there there the volunteers organized a bucket brigade and hauled 5 gallon buckets of diesel up 17 floors in 1-2 floor increments (I was 12-13/14) for about two and a half hours until the generator tank was full. Supposedly that will get them through to the morning. Not a lot to report about that other than generators are big and everybody came out of it smelling like diesel.
Here is the full story including photos
Customers of Datagram were knocked offline Monday evening as water flooded the basement of its building at 33 Whitehall, knocking out high-traffic sites including Gawker, Gizmodo, Buzzfeed and Mediate.
“Basement flooded, fuel pump off line – we got people working on it now. 5 feet of water now,” an official wrote on Twitter.
Website Techcrunch reported several datacenters having problems in this blog. Not only primary datacenters were shutdown because of powerfailures. Also at least one failover was unsuccessful according to the blog.
For critical services a single datacenter could proof to be not enough. There is a risk that some unexpected cause will bring down services. Think about the fuelpump to the power generator flooded, an electrician making a mistake and put a high voltage on your servers, a fire in a neighbor building causing ash getting into the airco, waterdamage etc etc.
For the same reason data is copied to backup or snapshot, your critical infrastructure plus its data should be redundant. Some organizations will be added to the list of ‘AGREE WITH THAT’ after October 30, 2012.
Several solutions are available to make sure servers can be moved to another datacenter. Either using storage replication or hypervisor based replication. The later has many advantages over storage replication: storage agnostic, lower costs, per VM instead of per volume replication etc.
Zerto Virtual Replication is one of the few hypervisor based replication solutions for VMware vSphere. Zerto has a blogposting describing how some of their customers performed disaster avoidance before Hurricane Sandy hit the East Coast.
A while ago I wrote about the 1.0 version of Virtual Replication. Currently the software is at version 2.0
VirtualSharp ReliableDR ,VMware Site Recovery Manager, vSphere Replication and Veeam Backup & Replication are also able to replicate data. For a compare see this blog.
VirtualSharp ReliableDR prevents situations like the one below. It will ensure your DR works according the RTO and RPO set. This is automatically checked, every hour, every day or whatever schedule set.
A single datacenter could prove to be not enough for business critical applications. Make a calculation of costs to protect your applications versus the cost of downtime, lost data, production lost, sales, image etc etc.
Software like the ones named and services as DRaaS are a relative cheap investment to give the IT-manager a good sleep while Hurrincane Tony,Valerie, William, Andrea, Barry, Chantal etc hit the datacenter.