Enterprise IT teams are continually challenged to manage larger and more complex systems with fewer resources. This requires a level of efficiency that can only come from complete visibility and intelligent control, based on the data coming out of IT systems. It is not surprising to see that a growing number of organizations are turning to IT Operations Analytics solutions to help them track performance, improve operational efficiency, and prevent service disruptions within their IT infrastructure.
IT operations analytics provide a robust set of tools that can generate the necessary insights to help IT operations teams proactively determine the risks, impacts, or outages that can occur due to various events that may take place in an environment. Gartner estimates that by 2017, approximately 15% of enterprises will actively use ITOA (IT Operations Analytics) technologies to provide insight into both business execution and IT operations, up from fewer than 5% today.
So how can we then use these analytics to effectively improve IT operational excellence? How can it help us make better decisions? And most importantly, how can it help prevent downtime and service disruptions?
Continuity Software recently conducted an infrastructure resiliency survey, with the goal of helping IT infrastructure and operations executives benchmark their organization’s performance and practices against their peers. The results presented here are based on responses from 230 IT professionals from a wide range of industries and geographies collected through an online survey.
Most survey respondents come from mid-size and large companies, with 40% of the survey respondents coming from organizations of over 10,000 employees. Over half of the respondents (54%) have more than 500 servers in their datacenter.
Some of the key findings of the survey include:
■ Avoiding productivity loss is the top driver for infrastructure resiliency initiatives, cited by 44% of the survey respondents. Additional drivers include ensuring customer satisfaction (22%), protecting company reputation (17%) and regulatory compliance (13%)
■ Service availability goals are becoming more ambitious. As many as 81% of the survey respondents have a service availability goal of less than 8 hours of unplanned downtime a year (compared to 73% in 2014), and 37% have a goal of less than one hour a year.
■ At the same time, as many as 39% of the respondents fell short of meeting their goal. 34% of the organizations surveyed had an unplanned outage in the past month, and 13% had one in the past week.
■ While cyber-attacks make headlines, they only cause a small fraction of system downtime. The most common causes are application error and system upgrades, each responsible for over four hours a year on average.
■ Although the majority of the survey respondents have moved some of their mission-critical systems to the cloud, those that have mission-critical systems in the cloud were less successful in meeting their service availability goals compared to organizations that have not made the move.
■ The top challenge in meeting infrastructure resiliency goals is the knowledge gap and inability to keep up with vendor recommendations and best practices. This challenge is significantly more prominent in the cloud environment and one of the primary factors why companies with a larger cloud footprint are struggling to meet their goals.
■ Large companies have a higher price tag on downtime. For 36% of the organizations with over 10,000 employees, the average hour of downtime costs over $100,000.
As IT environments become more complex and more systems are deployed in virtualized private cloud settings, having the right tools to manage IT operations becomes essential. IT Operations Analytics solutions that generate actionable insights across the entire IT landscape are helping IT teams be more proactive and efficient, allowing organizations to improve resiliency and prevent disruptions to critical business services.
Doron Pinhas is CTO of Continuity Software.
In Episode 9, Sean McDermott, President, CEO and Founder of Windward Consulting Group, joins the AI+ITOPS Podcast to discuss how the pandemic has impacted IT and is driving the need for AIOps ...
Michael Olson on the AI+ITOPS Podcast: "I really see AIOps as being a core requirement for observability because it ... applies intelligence to your telemetry data and your incident data ... to potentially predict problems before they happen."
Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams. It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability ...
The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...
All application traffic travels across the network. While application performance management tools can offer insight into how critical applications are functioning, they do not provide visibility into the broader network environment. In order to optimize application performance, you need a few key capabilities. Let's explore three steps that can help NetOps teams better support the critical applications upon which your business depends ...
In Episode 8, Michael Olson, Director of Product Marketing at New Relic, joins the AI+ITOPS Podcast to discuss how AIOps provides real benefits to IT teams ...
Will Cappelli on the AI+ITOPS Podcast: "I'll predict that in 5 years time, APM as we know it will have been completely mutated into an observability plus dynamic analytics capability."
When you consider that the average end-user interacts with at least 8 applications, then think about how important those applications are in the overall success of the business and how often the interface between the application and the hardware needs to be updated, it's a potential minefield for business operations. Any single update could explode in your face at any time ...
Despite the efforts in modernizing and building a robust infrastructure, IT teams routinely deal with the application, database, hardware, or software outages that can last from a few minutes to several days. These types of incidents can cause financial losses to businesses and damage its reputation ...