New Data Reveals Widespread Downtime and Security Risks in 99% of Enterprise Private Cloud Environments
February 08, 2017

Doron Pinhas
Continuity Software

Share this

Industrial and technological revolutions happen because new manufacturing systems or technologies make life easier, less expensive, more convenient, or more efficient. It's been that way in every epoch – but Continuity Software's new study indicates that in the cloud era, there's still work to be done.

With the rise of cloud technology in recent years, Continuity Software conducted an analysis of live enterprise private cloud environments – and the results are not at all reassuring. According to configuration data gathered from over 100 enterprise environments over the past year, the study found that there were widespread performance issues in 97% of them, putting the IT system at great risk for downtime. Ranked by the participating enterprises as the greatest concern, downtime risks were still present in each of the tested environments.

A deep dive into the study findings revealed numerous reasons for the increased operational risk in private cloud environments, ranging from lack of awareness to critical vendor recommendations, inconsistent configuration across virtual infrastructure components and incorrect alignment between different technology layers (such as virtual networks and physical resources, storage and compute layers, etc.).

The downtime risks were not specific to any particular configuration of hardware, software, or operating system. Indeed, the studied enterprises used a diverse technology stack: 48% of the organizations are pure Windows shops, compared to 7% of the organizations that run primarily Linux. 46% of the organizations use a mix of operating systems. Close to three quarters (73%) of the organizations use EMC data storage systems and 27% of the organizations use replication for automated offsite data protection. And 12% utilized active-active failover for continuous availability.

Certainly in the companies in question, the IT departments include top engineers and administrators – yet nearly all of the top companies included in the study have experienced some, and in a few cases many, issues.

While the results are unsettling, they are certainly not surprising. The modern IT environment is extremely complex and volatile: changes are made daily by multiple teams in a rapidly evolving technology landscape. With daily patching, upgrades, capacity expansion, etc., the slightest miscommunication between teams, or a knowledge gap could result in hidden risks to the stability of the IT environment.

Unlike legacy systems, in which standard testing and auditing practices are employed regularly (typically once or twice a year), private cloud infrastructure is not regularly tested. Interestingly, this fact is not always fully realized, even by seasoned IT experts. Virtual infrastructure is often designed to be "self-healing," using features such as virtual machine High Availability and workload mobility. Indeed, some evidence is regularly provided to demonstrate that they are working; after all, IT executives may argue, "not a week goes by with some virtual machines failing over successfully."

This perception of safety can be misleading, since a chain is only as strong as its weakest link; Simply put, it's a number game. Over the course of any given week, only a minute fraction of the virtual machines will actually be failed-over – usually less than 1%. What about the other 99%? Is it realistic to expect they're also fully protected?

The only way to determine the private cloud is truly resilient would be to prove every possible permutation of failure could be successfully averted. Of course, this could not be accomplished with manual processes, which would be much too time consuming, and potentially disruptive. The only sustainable and scalable approach would be to automate private cloud configuration validation and testing.

Individual vendors offer basic health measurements for their solution stack (for example, VMware, Microsoft, EMC and others). While useful, this is far from a real solution, since, as the study shows, the majority of the issues occur due to incorrect alignment between the different layers. In recent years, more holistic solutions have entered the market, that offer vendor agnostic, cross-domain validation.

While such approaches come with a cost, it is by far less expensive than the alternative cost of experiencing a critical outage. The cost of a single hour of downtime, according to multiple industry studies, can easily reach hundreds of thousands of dollars (and, in some verticals even millions).

Doron Pinhas is CTO of Continuity Software.

Share this

The Latest

September 28, 2020

In Episode 9, Sean McDermott, President, CEO and Founder of Windward Consulting Group, joins the AI+ITOPS Podcast to discuss how the pandemic has impacted IT and is driving the need for AIOps ...

September 25, 2020

Michael Olson on the AI+ITOPS Podcast: "I really see AIOps as being a core requirement for observability because it ... applies intelligence to your telemetry data and your incident data ... to potentially predict problems before they happen."

September 24, 2020

Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams. It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability ...

September 23, 2020

The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...

September 22, 2020

All application traffic travels across the network. While application performance management tools can offer insight into how critical applications are functioning, they do not provide visibility into the broader network environment. In order to optimize application performance, you need a few key capabilities. Let's explore three steps that can help NetOps teams better support the critical applications upon which your business depends ...

September 21, 2020

In Episode 8, Michael Olson, Director of Product Marketing at New Relic, joins the AI+ITOPS Podcast to discuss how AIOps provides real benefits to IT teams ...

September 18, 2020

Will Cappelli on the AI+ITOPS Podcast: "I'll predict that in 5 years time, APM as we know it will have been completely mutated into an observability plus dynamic analytics capability."

September 17, 2020
One of the benefits of doing the EMA Radar Report: AIOps- A Guide for Investing in Innovation was getting data from all 17 vendors on critical areas ranging from deployment and adoption challenges, to cost and pricing, to architectural and functionality insights across everything from heuristics, to automation, and data assimilation ...
September 16, 2020

When you consider that the average end-user interacts with at least 8 applications, then think about how important those applications are in the overall success of the business and how often the interface between the application and the hardware needs to be updated, it's a potential minefield for business operations. Any single update could explode in your face at any time ...

September 15, 2020

Despite the efforts in modernizing and building a robust infrastructure, IT teams routinely deal with the application, database, hardware, or software outages that can last from a few minutes to several days. These types of incidents can cause financial losses to businesses and damage its reputation ...