Skip to main content

Reducing Risk, Improving Resiliency: How Companies Use ITOA

Doron Pinhas

Enterprise IT teams are continually challenged to manage larger and more complex systems with fewer resources. This requires a level of efficiency that can only come from complete visibility and intelligent control, based on the data coming out of IT systems. It is not surprising to see that a growing number of organizations are turning to IT Operations Analytics solutions to help them track performance, improve operational efficiency, and prevent service disruptions within their IT infrastructure.

IT operations analytics provide a robust set of tools that can generate the necessary insights to help IT operations teams proactively determine the risks, impacts, or outages that can occur due to various events that may take place in an environment. Gartner estimates that by 2017, approximately 15% of enterprises will actively use ITOA (IT Operations Analytics) technologies to provide insight into both business execution and IT operations, up from fewer than 5% today.

So how can we then use these analytics to effectively improve IT operational excellence? How can it help us make better decisions? And most importantly, how can it help prevent downtime and service disruptions?

Continuity Software recently conducted an infrastructure resiliency survey, with the goal of helping IT infrastructure and operations executives benchmark their organization’s performance and practices against their peers. The results presented here are based on responses from 230 IT professionals from a wide range of industries and geographies collected through an online survey.

Most survey respondents come from mid-size and large companies, with 40% of the survey respondents coming from organizations of over 10,000 employees. Over half of the respondents (54%) have more than 500 servers in their datacenter.

Some of the key findings of the survey include:

■ Avoiding productivity loss is the top driver for infrastructure resiliency initiatives, cited by 44% of the survey respondents. Additional drivers include ensuring customer satisfaction (22%), protecting company reputation (17%) and regulatory compliance (13%)

■ Service availability goals are becoming more ambitious. As many as 81% of the survey respondents have a service availability goal of less than 8 hours of unplanned downtime a year (compared to 73% in 2014), and 37% have a goal of less than one hour a year.


■ At the same time, as many as 39% of the respondents fell short of meeting their goal. 34% of the organizations surveyed had an unplanned outage in the past month, and 13% had one in the past week.

■ While cyber-attacks make headlines, they only cause a small fraction of system downtime. The most common causes are application error and system upgrades, each responsible for over four hours a year on average.

■ Although the majority of the survey respondents have moved some of their mission-critical systems to the cloud, those that have mission-critical systems in the cloud were less successful in meeting their service availability goals compared to organizations that have not made the move.


■ The top challenge in meeting infrastructure resiliency goals is the knowledge gap and inability to keep up with vendor recommendations and best practices. This challenge is significantly more prominent in the cloud environment and one of the primary factors why companies with a larger cloud footprint are struggling to meet their goals.


■ Large companies have a higher price tag on downtime. For 36% of the organizations with over 10,000 employees, the average hour of downtime costs over $100,000.

As IT environments become more complex and more systems are deployed in virtualized private cloud settings, having the right tools to manage IT operations becomes essential. IT Operations Analytics solutions that generate actionable insights across the entire IT landscape are helping IT teams be more proactive and efficient, allowing organizations to improve resiliency and prevent disruptions to critical business services.

Doron Pinhas is CTO of Continuity Software.

Hot Topics

The Latest

I've spent a lot of time in the channel, and one thing I keep coming back to is this: a partner program is only as good as what it looks like in the field. Many programs look great on paper, but when a partner is in front of a customer navigating a complex hybrid environment or trying to make the case for AI-powered observability, the gap between what a vendor promises and what it actually delivers becomes very clear, very fast ...

Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...

For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

Reducing Risk, Improving Resiliency: How Companies Use ITOA

Doron Pinhas

Enterprise IT teams are continually challenged to manage larger and more complex systems with fewer resources. This requires a level of efficiency that can only come from complete visibility and intelligent control, based on the data coming out of IT systems. It is not surprising to see that a growing number of organizations are turning to IT Operations Analytics solutions to help them track performance, improve operational efficiency, and prevent service disruptions within their IT infrastructure.

IT operations analytics provide a robust set of tools that can generate the necessary insights to help IT operations teams proactively determine the risks, impacts, or outages that can occur due to various events that may take place in an environment. Gartner estimates that by 2017, approximately 15% of enterprises will actively use ITOA (IT Operations Analytics) technologies to provide insight into both business execution and IT operations, up from fewer than 5% today.

So how can we then use these analytics to effectively improve IT operational excellence? How can it help us make better decisions? And most importantly, how can it help prevent downtime and service disruptions?

Continuity Software recently conducted an infrastructure resiliency survey, with the goal of helping IT infrastructure and operations executives benchmark their organization’s performance and practices against their peers. The results presented here are based on responses from 230 IT professionals from a wide range of industries and geographies collected through an online survey.

Most survey respondents come from mid-size and large companies, with 40% of the survey respondents coming from organizations of over 10,000 employees. Over half of the respondents (54%) have more than 500 servers in their datacenter.

Some of the key findings of the survey include:

■ Avoiding productivity loss is the top driver for infrastructure resiliency initiatives, cited by 44% of the survey respondents. Additional drivers include ensuring customer satisfaction (22%), protecting company reputation (17%) and regulatory compliance (13%)

■ Service availability goals are becoming more ambitious. As many as 81% of the survey respondents have a service availability goal of less than 8 hours of unplanned downtime a year (compared to 73% in 2014), and 37% have a goal of less than one hour a year.


■ At the same time, as many as 39% of the respondents fell short of meeting their goal. 34% of the organizations surveyed had an unplanned outage in the past month, and 13% had one in the past week.

■ While cyber-attacks make headlines, they only cause a small fraction of system downtime. The most common causes are application error and system upgrades, each responsible for over four hours a year on average.

■ Although the majority of the survey respondents have moved some of their mission-critical systems to the cloud, those that have mission-critical systems in the cloud were less successful in meeting their service availability goals compared to organizations that have not made the move.


■ The top challenge in meeting infrastructure resiliency goals is the knowledge gap and inability to keep up with vendor recommendations and best practices. This challenge is significantly more prominent in the cloud environment and one of the primary factors why companies with a larger cloud footprint are struggling to meet their goals.


■ Large companies have a higher price tag on downtime. For 36% of the organizations with over 10,000 employees, the average hour of downtime costs over $100,000.

As IT environments become more complex and more systems are deployed in virtualized private cloud settings, having the right tools to manage IT operations becomes essential. IT Operations Analytics solutions that generate actionable insights across the entire IT landscape are helping IT teams be more proactive and efficient, allowing organizations to improve resiliency and prevent disruptions to critical business services.

Doron Pinhas is CTO of Continuity Software.

Hot Topics

The Latest

I've spent a lot of time in the channel, and one thing I keep coming back to is this: a partner program is only as good as what it looks like in the field. Many programs look great on paper, but when a partner is in front of a customer navigating a complex hybrid environment or trying to make the case for AI-powered observability, the gap between what a vendor promises and what it actually delivers becomes very clear, very fast ...

Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...

For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...