Complacency Kills Uptime in Virtualized Environments
April 10, 2018

Chris Adams
Park Place Technologies

Share this

Risk is relative. For example, studies have shown that wearing seatbelts can reduce highway safety, while more padding on hockey and American football players can increase injuries. It's called the Peltzman Effect and it describes how humans change behavior when risk factors are reduced. They often act more recklessly and drive risk right back up.

The phenomenon is recognized by many economists, its effects have been studied in the field of medicine, and I'd argue it is at the root of an interesting trend in IT — namely the increasing cost of downtime despite our more reliable virtualized environments.

Downtime Costs Are Rising

A study by the Ponemon Institute , for example, found the average cost of data center outages rose from $505,502 in 2010 to $740,357 in 2016. And the maximum cost was up 81% over the same time period, reaching over $2.4 million.

There are a lot of factors represented in these figures. For example, productivity losses are higher because labor costs are, and missed business opportunities are worth more today than they were several years ago. Yet advancements like virtual machines (VMs) with their continuous mirroring and seamless backups have not slashed downtime costs to the degree many IT pros had once predicted.

Have we as IT professionals dropped our defensive stance because we believe too strongly in the power of VMs and other technologies to save us? There are some signs that we have. For all the talk of cyberattacks—well deserved as it is—they cause only 10% of downtime. Hardware failures, on the other hand, account for 40%, according to Network Computing. And the Ponemon research referenced above found simple UPS problems to be at the root of one-quarter of outages.

Of course, VMs alone are not to blame, but it's worth looking at how downtime costs can increase when businesses rely on high-availability, virtually partitioned servers.

3 VM-Related Reasons for the Trend

The problem with VMs generally boils down to an "all eggs in one basket" problem. Separate workloads that would previously have run on multiple physical servers are consolidated to one server. Mirroring, automatic failover, and backups are intended to reduce risk associated with this single point of failure, but when these tactics fall through or complicated issues cascade, the resulting downtime can be especially costly for several reasons.

1. Utilization rates are higher

Work by McKinsey & Company and Gartner both pegged utilization rates for non-virtualized servers in the 6% to 12% range. With VMs, however, utilization typically approaches 30% and often stretches far higher. These busy servers are processing more workloads so downtime impacts are multiplied.

2. More customers are affected

Internal and external customers are accustomed to using VMs to share physical servers, so outages now affect a greater variety of workloads. This expands business consequences. A co-location provider could easily face irate calls and emails from dozens of clients, and a corporate data center manager could see complaints rise from the help desk to the C suite.

3. Complexity is prolonging downtime

Virtualization projects were supposed to simplify data centers but many have not, according to CIO Magazine. In their survey, respondents said they experience an average of 16 outages per year, 11 of which were caused by system failure resulting from complexity. And more complex systems are more difficult to troubleshoot and repair, making for longer downtime and higher overall costs.

Read Part 2: Solutions for Minimizing Server Downtime

Chris Adams is President and COO of Park Place Technologies
Share this

The Latest

August 21, 2018

High availability's (HA) primary objective has historically been focused on ensuring continuous operations and performance. HA was built on a foundation of redundancy and failover technologies and methodologies to ensure business continuity in the event of workload spikes, planned maintenance, and unplanned downtime. Today, HA methodologies have been superseded by intelligent workload routing automation (i.e., intelligent availability), in that data and their processing are consistently directed to the proper place at the right time ...

August 20, 2018

You need insight to maximize performance — not inefficient troubleshooting, longer time to resolution, and an overall lack of application intelligence. Steps 5 through 10 will help you maximize the performance of your applications and underlying network infrastructure ...

August 17, 2018

As a Network Operations professional, you know how hard it is to ensure optimal network performance when you’re unsure of how end-user devices, application code, and infrastructure affect performance. Identifying your important applications and prioritizing their performance is more difficult than ever, especially when much of an organization’s web-based traffic appears the same to the network. You need insight to maximize performance — not inefficient troubleshooting, longer time to resolution, and an overall lack of application intelligence. But you can stay ahead. Follow these 10 steps to maximize the performance of your applications and underlying network infrastructure ...

August 16, 2018

IT organizations are constantly trying to optimize operations and troubleshooting activities and for good reason. Let's look at one example for the medical industry. Networked applications, such as electronic medical records (EMR), are vital for hospitals to provide outstanding service to their patients and physicians. However, a networking team can often not be aware of slow response times on the remotely hosted EMR application until a physician or someone else calls in to complain ...

August 15, 2018

In 2014, AWS Lambda introduced serverless architecture. Since then, many other cloud providers have developed serverless options. What’s behind this rapid growth? ...

August 14, 2018

This question is really two questions. The first would be: What's really going on in terms of a confusion of terms? — as we wrestle with AIOps, IT Operational Analytics, big data, AI bots, machine learning, and more generically stated "AI platforms" (… and the list is far from complete). The second might be phrased as: What's really going on in terms of real-world advanced IT analytics deployments — where are they succeeding, and where are they not? This blog will look at both questions as a way of introducing EMA's newest research with data ...

August 13, 2018

Consumers will now trade app convenience for security, according to a study commissioned by F5 Networks, The Curve of Convenience – The Trade-Off between Security and Convenience ...

August 10, 2018

Gartner unveiled the CX Pyramid, a new methodology to test organizations’ customer journeys and forge more powerful experiences that deliver greater customer loyalty and brand advocacy ...

August 09, 2018

Nearly half (48 percent) of consumers report that they currently use, or have used in the past, services of organizations that were involved in a publicly disclosed data breach and, of those, 48 percent have stopped using the services of an organization because of a breach, according to Global State of Digital Trust Survey and Index 2018, a new report from CA Technologies ...

August 08, 2018

Here's the problem: IT teams are in the dark. The only information they have available to them is based on what users decide to tell them about through calls to the help desk ...