What MSPs can learn from the Delta outage

Equipment failure

Delta flights started back up around 9 a.m., and I ended up being delayed by only an hour. Other travelers weren’t so lucky. By the end of the day on Monday, about 1,000 flights were cancelled as a result of the Delta outage, and thousands more were delayed. Complications continued into Tuesday as Delta cancelled an additional 530 flights while it attempted to resume normal operations.

The cause of the massive outage was initially unclear, Delta pointed to a power outage as the cause of the problems, but according to reports Georgia Power suggested that it was an equipment failure at Delta instead. And that turned out to be the case.

“Monday morning a critical power control module at our Technology Command Center malfunctioned, causing a surge to the transformer and a loss of power,” said Delta COO Gil West in a statement released Tuesday afternoon. “The universal power was stabilized, and power was restored quickly. But when this happened, critical systems and network equipment didn’t switch over to backups. Other systems did. And now we’re seeing instability in these systems.”

Delta isn’t the first airline to experience a major disruption like this due to a technology failure. For example, the Wall Street Journal pointed to Southwest Airlines cancelling 2,300 flights in four days back in July after a router malfunction at its data center in Texas which forced the airline to reboot its entire system, something that takes 12 hours, and United Airlines grounding several flights last year after router issues cause network problems.

Placing blame

While the problems have caused some to question why Delta hasn’t moved to the cloud yet, others pointed to consolidation in the airline industry leading to companies that are too large and too dependent on dated legacy IT systems and equipment. And some suggested that a reliance on IT offshoring and a poorly tested disaster recovery plan had a role in how the crisis played out. As Robert Cringely of BetaNews put it: “Anything less than a 100-percent service backup isn’t disaster recovery, it is disaster coping.”

No matter what the underlying causes are, the system outage and prolonged recovery is going to be costly for Delta, both in terms of lost revenue and damage to their reputation.

MSPs should see this as a large-scale illustration of the importance of proper backup and disaster recovery and the dangers of relying on legacy systems. If one of your SMB customers’ systems go down, it might not strand thousands of travelers or make national news, but it’s an example you can use to help SMBs understand how vital these types of precautions are.

Photo Credit: Bulent Kavakkoru via Flickr.com. Used under CC 2.0 License.

Categories

Tags

Archives

What MSPs can learn from the Delta outage

Equipment failure

Placing blame

Posted by Anne Campbell

Leave a reply Cancel reply