Complacency Kills Uptime in Virtualized Environments
April 10, 2018

Chris Adams
Park Place Technologies

Share this

Risk is relative. For example, studies have shown that wearing seatbelts can reduce highway safety, while more padding on hockey and American football players can increase injuries. It's called the Peltzman Effect and it describes how humans change behavior when risk factors are reduced. They often act more recklessly and drive risk right back up.

The phenomenon is recognized by many economists, its effects have been studied in the field of medicine, and I'd argue it is at the root of an interesting trend in IT — namely the increasing cost of downtime despite our more reliable virtualized environments.

Downtime Costs Are Rising

A study by the Ponemon Institute , for example, found the average cost of data center outages rose from $505,502 in 2010 to $740,357 in 2016. And the maximum cost was up 81% over the same time period, reaching over $2.4 million.

There are a lot of factors represented in these figures. For example, productivity losses are higher because labor costs are, and missed business opportunities are worth more today than they were several years ago. Yet advancements like virtual machines (VMs) with their continuous mirroring and seamless backups have not slashed downtime costs to the degree many IT pros had once predicted.

Have we as IT professionals dropped our defensive stance because we believe too strongly in the power of VMs and other technologies to save us? There are some signs that we have. For all the talk of cyberattacks—well deserved as it is—they cause only 10% of downtime. Hardware failures, on the other hand, account for 40%, according to Network Computing. And the Ponemon research referenced above found simple UPS problems to be at the root of one-quarter of outages.

Of course, VMs alone are not to blame, but it's worth looking at how downtime costs can increase when businesses rely on high-availability, virtually partitioned servers.

3 VM-Related Reasons for the Trend

The problem with VMs generally boils down to an "all eggs in one basket" problem. Separate workloads that would previously have run on multiple physical servers are consolidated to one server. Mirroring, automatic failover, and backups are intended to reduce risk associated with this single point of failure, but when these tactics fall through or complicated issues cascade, the resulting downtime can be especially costly for several reasons.

1. Utilization rates are higher

Work by McKinsey & Company and Gartner both pegged utilization rates for non-virtualized servers in the 6% to 12% range. With VMs, however, utilization typically approaches 30% and often stretches far higher. These busy servers are processing more workloads so downtime impacts are multiplied.

2. More customers are affected

Internal and external customers are accustomed to using VMs to share physical servers, so outages now affect a greater variety of workloads. This expands business consequences. A co-location provider could easily face irate calls and emails from dozens of clients, and a corporate data center manager could see complaints rise from the help desk to the C suite.

3. Complexity is prolonging downtime

Virtualization projects were supposed to simplify data centers but many have not, according to CIO Magazine. In their survey, respondents said they experience an average of 16 outages per year, 11 of which were caused by system failure resulting from complexity. And more complex systems are more difficult to troubleshoot and repair, making for longer downtime and higher overall costs.

Read Part 2: Solutions for Minimizing Server Downtime

Chris Adams is President and COO of Park Place Technologies
Share this

The Latest

September 19, 2019

You must dive into various aspects or themes of services so that you can gauge authentic user experience. There are usually five main themes that the customer thinks of when experiencing a service ...

September 18, 2019

Service desks teams use internally focused performance-based metrics more than many might think. These metrics are essential and remain relevant, but they do not provide any insight into the user experience. To gain actual insight into user satisfaction, you need to change your metrics. The question becomes: How do I efficiently change my metrics? Then, how do you best go about it? ...

September 17, 2019

The skills gap is a very real issue impacting today's IT professionals. In preparation for IT Pro Day 2019, celebrated on September 17, 2019, SolarWinds explored this skills gap by surveying technology professionals around the world to understand their needs and how organizations are addressing these needs ...

September 16, 2019

Top performing organizations (TPOs) in managing IT Operations are experiencing significant operational and business benefits such as 5.9x shorter average Mean Time to Resolution (MTTR) per incident as compared to all other organizations, according to a new market study from Digital Enterprise Journal ...

September 12, 2019

Multichannel marketers report that mobile-friendly websites have emerged as a dominant engagement channel for their brands, according to Gartner. However, Gartner research has found that too many organizations build their mobile websites without accurate knowledge about, or regard for, their customer's mobile preferences ...

September 11, 2019

Do you get excited when you discover a new service from one of the top three public clouds or a new public cloud provider? I do. But every time you feel excited about new cloud offerings, you should also feel a twinge of fear. Because in the tech world, each time we introduce something new we also add a new point of failure for our application and potentially a service we are stuck with. This is why thinking about the long-tail cloud for your organization is important ...

September 10, 2019

A solid start to migration can be approached three ways — all of which are ladder up to adopting a Software Intelligence strategy ...

September 09, 2019

Many aren't doing the due diligence needed to properly assess and facilitate a move of applications to the cloud. This is according to the recent 2019 Cloud Migration Report which revealed half of IT leaders at banks, insurance and telecommunications companies do not conduct adequate risk assessments prior to moving apps over to the cloud. Essentially, they are going in blind and expecting everything to turn out ok. Spoiler alert: It doesn't ...

September 05, 2019

Research conducted by Aite Group uncovered more than 80 global eCommerce sites that were actively being compromised by Magecart groups, according to a new report, In Plain Sight II: On the Trail of Magecart ...

September 04, 2019

In this blog, I'd like to expand beyond the TAP and look at the role Packet Brokers play in an organization's visibility architecture. Here are 5 common mistakes that are made when deploying Packet Brokers, and how to avoid them ...