Reducing Risk, Improving Resiliency: How Companies Use ITOA
April 06, 2016

Doron Pinhas
Continuity Software

Share this

Enterprise IT teams are continually challenged to manage larger and more complex systems with fewer resources. This requires a level of efficiency that can only come from complete visibility and intelligent control, based on the data coming out of IT systems. It is not surprising to see that a growing number of organizations are turning to IT Operations Analytics solutions to help them track performance, improve operational efficiency, and prevent service disruptions within their IT infrastructure.

IT operations analytics provide a robust set of tools that can generate the necessary insights to help IT operations teams proactively determine the risks, impacts, or outages that can occur due to various events that may take place in an environment. Gartner estimates that by 2017, approximately 15% of enterprises will actively use ITOA (IT Operations Analytics) technologies to provide insight into both business execution and IT operations, up from fewer than 5% today.

So how can we then use these analytics to effectively improve IT operational excellence? How can it help us make better decisions? And most importantly, how can it help prevent downtime and service disruptions?

Continuity Software recently conducted an infrastructure resiliency survey, with the goal of helping IT infrastructure and operations executives benchmark their organization’s performance and practices against their peers. The results presented here are based on responses from 230 IT professionals from a wide range of industries and geographies collected through an online survey.

Most survey respondents come from mid-size and large companies, with 40% of the survey respondents coming from organizations of over 10,000 employees. Over half of the respondents (54%) have more than 500 servers in their datacenter.

Some of the key findings of the survey include:

■ Avoiding productivity loss is the top driver for infrastructure resiliency initiatives, cited by 44% of the survey respondents. Additional drivers include ensuring customer satisfaction (22%), protecting company reputation (17%) and regulatory compliance (13%)

■ Service availability goals are becoming more ambitious. As many as 81% of the survey respondents have a service availability goal of less than 8 hours of unplanned downtime a year (compared to 73% in 2014), and 37% have a goal of less than one hour a year.


■ At the same time, as many as 39% of the respondents fell short of meeting their goal. 34% of the organizations surveyed had an unplanned outage in the past month, and 13% had one in the past week.

■ While cyber-attacks make headlines, they only cause a small fraction of system downtime. The most common causes are application error and system upgrades, each responsible for over four hours a year on average.

■ Although the majority of the survey respondents have moved some of their mission-critical systems to the cloud, those that have mission-critical systems in the cloud were less successful in meeting their service availability goals compared to organizations that have not made the move.


■ The top challenge in meeting infrastructure resiliency goals is the knowledge gap and inability to keep up with vendor recommendations and best practices. This challenge is significantly more prominent in the cloud environment and one of the primary factors why companies with a larger cloud footprint are struggling to meet their goals.


■ Large companies have a higher price tag on downtime. For 36% of the organizations with over 10,000 employees, the average hour of downtime costs over $100,000.

As IT environments become more complex and more systems are deployed in virtualized private cloud settings, having the right tools to manage IT operations becomes essential. IT Operations Analytics solutions that generate actionable insights across the entire IT landscape are helping IT teams be more proactive and efficient, allowing organizations to improve resiliency and prevent disruptions to critical business services.

Doron Pinhas is CTO of Continuity Software.

Share this

The Latest

May 26, 2022

Site reliability engineers are development-focused IT professionals who work on developing and implementing solutions that solve reliability, availability, and scale problems. On the other hand, DevOps engineers are ops-focused workers who solve development pipeline problems. While there is a divide between the two professions, both sets of engineers cross the gap regularly, delivering their expertise and opinions to the other side and vice versa ...

May 25, 2022

Site reliability engineering (SRE) is fast becoming an essential aspect of modern IT operations, particularly in highly scaled, big data environments. As businesses and industries shift to the digital and embrace new IT infrastructures and technologies to remain operational and competitive, the need for a new approach for IT teams to find and manage the balance between launching new systems and features and ensuring these are intuitive, reliable, and friendly for end users has intensified as well ...

May 24, 2022

The most sophisticated observability practitioners (leaders) are able to cut downtime costs by 90%, from an estimated $23.8 million annually to just $2.5 million, compared to observability beginners, according to the State of Observability 2022 from Splunk in collaboration with the Enterprise Strategy Group. What's more, leaders in observability are more innovative and more successful at achieving digital transformation outcomes and other initiatives ...

May 23, 2022

Programmatically tracked service level indicators (SLIs) are foundational to every site reliability engineering practice. When engineering teams have programmatic SLIs in place, they lessen the need to manually track performance and incident data. They're also able to reduce manual toil because our DevOps teams define the capabilities and metrics that define their SLI data, which they collect automatically — hence "programmatic" ...

May 19, 2022

Recently, a regional healthcare organization wanted to retire its legacy monitoring tools and adopt AIOps. The organization asked Windward Consulting to implement an AIOps strategy that would help streamline its outdated and unwieldy IT system management. Our team's AIOps implementation process helped this client and can help others in the industry too. Here's what my team did ...

May 18, 2022

You've likely heard it before: every business is a digital business. However, some businesses and sectors digitize more quickly than others. Healthcare has traditionally been on the slower side of digital transformation and technology adoption, but that's changing. As healthcare organizations roll out innovations at increasing velocity, they must build a long-term strategy for how they will maintain the uptime of their critical apps and services. And there's only one tool that can ensure this continuous availability in our modern IT ecosystems. AIOps can help IT Operations teams ensure the uptime of critical apps and services ...

May 17, 2022

Between 2012 to 2015 all of the hyperscalers attempted to use the legacy APM solutions to improve their own visibility. To no avail. The problem was that none of the previous generations of APM solutions could match the scaling demand, nor could they provide interoperability due to their proprietary and exclusive agentry ...

May 16, 2022

The DevOps journey begins by understanding a team's DevOps flow and identifying precisely what tasks deliver the best return on engineers' time when automated. The rest of this blog will help DevOps team managers by outlining what jobs can — and should be automated ...

May 12, 2022

A survey from Snow Software polled more than 500 IT leaders to determine the current state of cloud infrastructure. Nearly half of the IT leaders who responded agreed that cloud was critical to operations during the pandemic with the majority deploying a hybrid cloud strategy consisting of both public and private clouds. Unsurprisingly, over the last 12 months, the majority of respondents had increased overall cloud spend — a substantial increase over the 2020 findings ...

May 11, 2022

As we all know, the drastic changes in the world have caused the workforce to take a hybrid approach over the last two years. A lot of that time, being fully remote. With the back and forth between home and office, employees need ways to stay productive and access useful information necessary to complete their daily work. The ability to obtain a holistic view of data relevant to the user and get answers to topics, no matter the worker's location, is crucial for a successful and efficient hybrid working environment ...