Complacency Kills Uptime in Virtualized Environments
April 10, 2018

Chris Adams
Park Place Technologies

Share this

Risk is relative. For example, studies have shown that wearing seatbelts can reduce highway safety, while more padding on hockey and American football players can increase injuries. It's called the Peltzman Effect and it describes how humans change behavior when risk factors are reduced. They often act more recklessly and drive risk right back up.

The phenomenon is recognized by many economists, its effects have been studied in the field of medicine, and I'd argue it is at the root of an interesting trend in IT — namely the increasing cost of downtime despite our more reliable virtualized environments.

Downtime Costs Are Rising

A study by the Ponemon Institute , for example, found the average cost of data center outages rose from $505,502 in 2010 to $740,357 in 2016. And the maximum cost was up 81% over the same time period, reaching over $2.4 million.

There are a lot of factors represented in these figures. For example, productivity losses are higher because labor costs are, and missed business opportunities are worth more today than they were several years ago. Yet advancements like virtual machines (VMs) with their continuous mirroring and seamless backups have not slashed downtime costs to the degree many IT pros had once predicted.

Have we as IT professionals dropped our defensive stance because we believe too strongly in the power of VMs and other technologies to save us? There are some signs that we have. For all the talk of cyberattacks—well deserved as it is—they cause only 10% of downtime. Hardware failures, on the other hand, account for 40%, according to Network Computing. And the Ponemon research referenced above found simple UPS problems to be at the root of one-quarter of outages.

Of course, VMs alone are not to blame, but it's worth looking at how downtime costs can increase when businesses rely on high-availability, virtually partitioned servers.

3 VM-Related Reasons for the Trend

The problem with VMs generally boils down to an "all eggs in one basket" problem. Separate workloads that would previously have run on multiple physical servers are consolidated to one server. Mirroring, automatic failover, and backups are intended to reduce risk associated with this single point of failure, but when these tactics fall through or complicated issues cascade, the resulting downtime can be especially costly for several reasons.

1. Utilization rates are higher

Work by McKinsey & Company and Gartner both pegged utilization rates for non-virtualized servers in the 6% to 12% range. With VMs, however, utilization typically approaches 30% and often stretches far higher. These busy servers are processing more workloads so downtime impacts are multiplied.

2. More customers are affected

Internal and external customers are accustomed to using VMs to share physical servers, so outages now affect a greater variety of workloads. This expands business consequences. A co-location provider could easily face irate calls and emails from dozens of clients, and a corporate data center manager could see complaints rise from the help desk to the C suite.

3. Complexity is prolonging downtime

Virtualization projects were supposed to simplify data centers but many have not, according to CIO Magazine. In their survey, respondents said they experience an average of 16 outages per year, 11 of which were caused by system failure resulting from complexity. And more complex systems are more difficult to troubleshoot and repair, making for longer downtime and higher overall costs.

Read Part 2: Solutions for Minimizing Server Downtime

Chris Adams is President and COO of Park Place Technologies
Share this

The Latest

March 26, 2020

While remote work policies have been gaining steam for the better part of the past decade across the enterprise space — driven in large part by more agile and scalable, cloud-delivered business solutions — recent events have pushed adoption into overdrive ...

March 25, 2020

Time-critical, unplanned work caused by IT disruptions continues to plague enterprises around the world, leading to lost revenue, significant employee morale problems and missed opportunities to innovate, according to the State of Unplanned Work Report 2020, conducted by Dimensional Research for PagerDuty ...

March 24, 2020

In today's iterative world, development teams care a lot more about how apps are running. There's a demand for fixing actionable items. Developers want to know exactly what's broken, what to fix right now, and what can wait. They want to know, "Do we build or fix?" This trade-off between building new features versus fixing bugs is one of the key factors behind the adoption of Application Stability management tools ...

March 23, 2020

With the rise of mobile apps and iterative development releases, Application Stability has answered the widespread need to monitor applications in a new way, shifting the focus from servers and networks to the customer experience. The emergence of Application Stability has caused some consternation for diehard APM fans. However, these two solutions embody very distinct monitoring focuses, which leads me to believe there's room for both tools, as well as different teams for both ...

March 19, 2020

The 2019 State of E-Commerce Infrastructure Report, from Webscale, analyzes findings from a comprehensive survey of more than 450 ecommerce professionals regarding how their online stores performed during the 2019 holiday season. Some key insights from the report include ...

March 18, 2020

Robinhood is a unicorn startup that has been disrupting the way by which many millennials have been investing and managing their money for the past few years. For Robinhood, the burden of proof was to show that they can provide an infrastructure that is as scalable, reliable and secure as that of major banks who have been developing their trading infrastructure for the last quarter-century. That promise fell flat last week, when the market volatility brought about a set of edge cases that brought Robinhood's trading app to its knees ...

March 17, 2020

Application backend monitoring is the key to acquiring visibility across the enterprise's application stack, from the application layer and underlying infrastructure to third-party API services, web servers and databases, be they on-premises, in a public or private cloud, or in a hybrid model. By tracking and reporting performance in real time, IT teams can ensure applications perform at peak efficiency — and guarantee a seamless customer experience. How can IT operations teams improve application backend monitoring? By embracing artificial intelligence for operations — AIOps ...

March 16, 2020

In 2020, DevOps teams will face heightened expectations for higher speed and frequency of code delivery, which means their IT environments will become even more modular, ephemeral and dynamic — and significantly more complicated to monitor. As a result, AIOps will further cement its position as the most effective technology that DevOps teams can use to see and control what's going on with their applications and their underlying infrastructure, so that they can prevent outages. Here I outline five key trends to watch related to how AIOps will impact DevOps in 2020 and beyond ...

March 12, 2020

With the spread of the coronavirus (COVID-19), CIOs should focus on three short-term actions to increase their organizations' resilience against disruptions and prepare for rebound and growth, according to Gartner ...

March 11, 2020

Whether you consider the first generation of APM or the updates that followed for SOA and microservices, the most basic premise of the tools remains the same — PROVIDE VISIBILITY ...