Complacency Kills Uptime in Virtualized Environments
April 10, 2018

Chris Adams
Park Place Technologies

Share this

Risk is relative. For example, studies have shown that wearing seatbelts can reduce highway safety, while more padding on hockey and American football players can increase injuries. It's called the Peltzman Effect and it describes how humans change behavior when risk factors are reduced. They often act more recklessly and drive risk right back up.

The phenomenon is recognized by many economists, its effects have been studied in the field of medicine, and I'd argue it is at the root of an interesting trend in IT — namely the increasing cost of downtime despite our more reliable virtualized environments.

Downtime Costs Are Rising

A study by the Ponemon Institute , for example, found the average cost of data center outages rose from $505,502 in 2010 to $740,357 in 2016. And the maximum cost was up 81% over the same time period, reaching over $2.4 million.

There are a lot of factors represented in these figures. For example, productivity losses are higher because labor costs are, and missed business opportunities are worth more today than they were several years ago. Yet advancements like virtual machines (VMs) with their continuous mirroring and seamless backups have not slashed downtime costs to the degree many IT pros had once predicted.

Have we as IT professionals dropped our defensive stance because we believe too strongly in the power of VMs and other technologies to save us? There are some signs that we have. For all the talk of cyberattacks—well deserved as it is—they cause only 10% of downtime. Hardware failures, on the other hand, account for 40%, according to Network Computing. And the Ponemon research referenced above found simple UPS problems to be at the root of one-quarter of outages.

Of course, VMs alone are not to blame, but it's worth looking at how downtime costs can increase when businesses rely on high-availability, virtually partitioned servers.

3 VM-Related Reasons for the Trend

The problem with VMs generally boils down to an "all eggs in one basket" problem. Separate workloads that would previously have run on multiple physical servers are consolidated to one server. Mirroring, automatic failover, and backups are intended to reduce risk associated with this single point of failure, but when these tactics fall through or complicated issues cascade, the resulting downtime can be especially costly for several reasons.

1. Utilization rates are higher

Work by McKinsey & Company and Gartner both pegged utilization rates for non-virtualized servers in the 6% to 12% range. With VMs, however, utilization typically approaches 30% and often stretches far higher. These busy servers are processing more workloads so downtime impacts are multiplied.

2. More customers are affected

Internal and external customers are accustomed to using VMs to share physical servers, so outages now affect a greater variety of workloads. This expands business consequences. A co-location provider could easily face irate calls and emails from dozens of clients, and a corporate data center manager could see complaints rise from the help desk to the C suite.

3. Complexity is prolonging downtime

Virtualization projects were supposed to simplify data centers but many have not, according to CIO Magazine. In their survey, respondents said they experience an average of 16 outages per year, 11 of which were caused by system failure resulting from complexity. And more complex systems are more difficult to troubleshoot and repair, making for longer downtime and higher overall costs.

Read Part 2: Solutions for Minimizing Server Downtime

Chris Adams is President and COO of Park Place Technologies
Share this

The Latest

October 19, 2020

The COVID-19 pandemic has compressed six years of modernization projects into 6 months. According to a recent report, IT leaders have accelerated projects aimed at increasing productivity and business agility, improving application performance and end-user experience, and driving additional revenue through existing channels ...

October 15, 2020

There is no doubt that automation has become the key aspect of modern IT management. The end-user computing market is no exception. With a large and complex technology stack and a huge number of applications, EUC specialists need to handle an ever-increasing number of changes at an ever-increasing rate. Many IT organizations are starting to realize that they can no longer control the flow of changes. It is time to think about how to facilitate change ...

October 14, 2020

Starting this September, the lifespan of an SSL/TLS certificate has been limited to 398 days, a reduction from the previous maximum certificate lifetime of 825 days. With this change, everyone needs to more carefully monitor SSL certificate expiration and server characteristics ...

October 13, 2020

Nearly 6 in 10 responding organizations have accelerated their digital transformations due to the COVID-19 pandemic, according to The IBM Institute for Business Value study COVID-19 and the Future of Business ...

October 08, 2020

Two-thirds (67%) of those surveyed expect the sheer quantity of data to grow nearly five times by 2025, according to a new report from Splunk: The Data Age Is Here. Are You Ready? ...

October 07, 2020

Gaming introduced the world to a whole new range of experiences through augmented reality (AR) and virtual reality (VR). And consumers are really catching on. To unlock the potential of these platforms, enterprises must ensure massive amounts of data can be transferred quickly and reliably to ensure an acceptable quality of experience. As such, this means that enterprises will need to turn to a 5G infrastructure powered by an adaptive network ...

October 06, 2020

A distributed, remote workforce is the new business reality. How can businesses keep operations going smoothly and quickly resolve issues when IT staff is in San Jose, employee A is working remotely in Denver at their home and employee B is a salesperson still doing some road traveling? The key is an IT architecture that promotes and supports "self-healing" at the endpoint to take care of issues before they impact employees. The essential element to achieve this is hyper-automation ...

October 05, 2020

In Episode 10, Prem Naraindas, CEO of Katonic.ai, joins the AI+ITOPS Podcast to discuss how emerging technologies can make life better for ITOps ...

October 02, 2020

Sean McDermott on the AI+ITOPS Podcast: "AIOps is really about the processing of vast amounts of data and the ability to move into a more analytical, prescriptive and automated methodology."

October 01, 2020

The cloud has recently proven to be a vital tool for many organizations to deal with the COVID-19 pandemic by enabling employees to work from home. To me, COVID-19 has clearly shown that work doesn't need to happen at the office. It has strengthened our belief that working from home is going to be the norm for many. The move to the cloud introduces many technical challenges ...