Skip to main content

Observability Leaders Report Fewer Outages

Observability has matured beyond its early adopter position and is now foundational for modern enterprises to achieve full visibility into today's complex technology environments, according to The State of Observability 2023, a report released by Splunk in collaboration with Enterprise Strategy Group.


The research shows that observability is instrumental in reducing outages, improving app reliability, growing revenue, strengthening customer experience (CX) and establishing digital resilience.

A key finding is how observability leaders are four times as likely to resolve instances of unplanned downtime in minutes, versus hours or days. This is notable as 76% of all respondents report that downtime can cost up to $500,000 per hour. It's clear that a faster approach to issue resolution can drive significant cost savings.

Key findings from the research also include:

Fewer outages, disruptions to customers

Leaders experience 33% less outages per year than beginners. (On average, beginners report six outages, while leaders experience two.)

Greater visual clarity drives ROI

Due to observability, a little over 80% of organizations can find and fix problems faster. In addition, 81% can see into hybrid ecosystems.

Stronger assurance to meet reliability goals

89% of leaders are completely confident in their ability to meet availability and performance requirements for their applications, 3.9x the rate of beginners.

Hybrid will persist

Organizations report maintaining 165 business applications (on average), with about half in the public cloud and half on-premises. As the number of apps grows, observability will remain vital to unify visibility across environments.

AIOps instrumental to CX

AIOps capabilities included in an observability practice outperform legacy solutions, by automatically determining the technical root cause of an issue (according to 34% of respondents,) to predicting problems before they turn into customer-impacting incidents (31%), to better assessing the severity of an incident (30%).

Resilience as North Star

95% say their observability leaders are collaborating more with line-of-business leaders on resilience strategies, which includes investing in solutions that recover customer services faster and remediate incidents more efficiently.

Communications and media lead in maturity

Communications and media companies are leading the way on observability savviness, with 13% tallied as leaders. Manufacturing and financial services followed with 8% categorized as leaders.

Public sector makes gains with leaders

The public sector tallied 4% as observability leaders, increasing from 0% in 2022, showing an opportunity for growth.

Unifying security monitoring and observability

The report shows how more organizations are unifying security monitoring and observability to obtain richer context on incidents and accelerate resolution, in comparison to last year. The reasons all respondents are choosing to unify include:

More granular and precise threat detection. 59% of all respondents uncover security issues more effectively, thanks to intelligence and correlation capabilities native to observability solutions.

A comprehensive approach. 55% uncover and assess more security vulnerabilities, thanks to the visibility afforded by observability solutions.

Ability to act quicker. 51% take action on security issues faster, thanks to the remediation capabilities of observability solutions.

"With the rising complexity of today's technology environments and the direct connection between reducing disruptions and optimal customer experiences, observability is fundamental to the successful operations of modern businesses," said Spiros Xanthos, SVP and GM for the Observability business at Splunk. "Observability enables businesses to keep their software and infrastructure reliable, systems secure and customers happy, making it a critical component to any organization's resilience strategy."

Methodology: The report defines observability leaders as organizations with at least 24 months of experience with observability. In addition, leaders achieved the highest rank in these five factors: the ability to correlate data across all observability tools, the adoption of AI/ML technology within their observability toolset, skills specialization in observability, the ability to cover both cloud-native and traditional application architectures and the adoption of AIOps.

The global survey was conducted from early December 2022 to mid-January 2023. The report surveyed 1,750 IT operations, application development and engineering leaders from organizations with 500 or more full-time employees and who are knowledgeable about their organization's observability practice. The survey respondents were drawn from 10 countries: Australia, Canada, France, Germany, India, Japan, New Zealand, Singapore, UK and US.

Hot Topics

The Latest

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Observability Leaders Report Fewer Outages

Observability has matured beyond its early adopter position and is now foundational for modern enterprises to achieve full visibility into today's complex technology environments, according to The State of Observability 2023, a report released by Splunk in collaboration with Enterprise Strategy Group.


The research shows that observability is instrumental in reducing outages, improving app reliability, growing revenue, strengthening customer experience (CX) and establishing digital resilience.

A key finding is how observability leaders are four times as likely to resolve instances of unplanned downtime in minutes, versus hours or days. This is notable as 76% of all respondents report that downtime can cost up to $500,000 per hour. It's clear that a faster approach to issue resolution can drive significant cost savings.

Key findings from the research also include:

Fewer outages, disruptions to customers

Leaders experience 33% less outages per year than beginners. (On average, beginners report six outages, while leaders experience two.)

Greater visual clarity drives ROI

Due to observability, a little over 80% of organizations can find and fix problems faster. In addition, 81% can see into hybrid ecosystems.

Stronger assurance to meet reliability goals

89% of leaders are completely confident in their ability to meet availability and performance requirements for their applications, 3.9x the rate of beginners.

Hybrid will persist

Organizations report maintaining 165 business applications (on average), with about half in the public cloud and half on-premises. As the number of apps grows, observability will remain vital to unify visibility across environments.

AIOps instrumental to CX

AIOps capabilities included in an observability practice outperform legacy solutions, by automatically determining the technical root cause of an issue (according to 34% of respondents,) to predicting problems before they turn into customer-impacting incidents (31%), to better assessing the severity of an incident (30%).

Resilience as North Star

95% say their observability leaders are collaborating more with line-of-business leaders on resilience strategies, which includes investing in solutions that recover customer services faster and remediate incidents more efficiently.

Communications and media lead in maturity

Communications and media companies are leading the way on observability savviness, with 13% tallied as leaders. Manufacturing and financial services followed with 8% categorized as leaders.

Public sector makes gains with leaders

The public sector tallied 4% as observability leaders, increasing from 0% in 2022, showing an opportunity for growth.

Unifying security monitoring and observability

The report shows how more organizations are unifying security monitoring and observability to obtain richer context on incidents and accelerate resolution, in comparison to last year. The reasons all respondents are choosing to unify include:

More granular and precise threat detection. 59% of all respondents uncover security issues more effectively, thanks to intelligence and correlation capabilities native to observability solutions.

A comprehensive approach. 55% uncover and assess more security vulnerabilities, thanks to the visibility afforded by observability solutions.

Ability to act quicker. 51% take action on security issues faster, thanks to the remediation capabilities of observability solutions.

"With the rising complexity of today's technology environments and the direct connection between reducing disruptions and optimal customer experiences, observability is fundamental to the successful operations of modern businesses," said Spiros Xanthos, SVP and GM for the Observability business at Splunk. "Observability enables businesses to keep their software and infrastructure reliable, systems secure and customers happy, making it a critical component to any organization's resilience strategy."

Methodology: The report defines observability leaders as organizations with at least 24 months of experience with observability. In addition, leaders achieved the highest rank in these five factors: the ability to correlate data across all observability tools, the adoption of AI/ML technology within their observability toolset, skills specialization in observability, the ability to cover both cloud-native and traditional application architectures and the adoption of AIOps.

The global survey was conducted from early December 2022 to mid-January 2023. The report surveyed 1,750 IT operations, application development and engineering leaders from organizations with 500 or more full-time employees and who are knowledgeable about their organization's observability practice. The survey respondents were drawn from 10 countries: Australia, Canada, France, Germany, India, Japan, New Zealand, Singapore, UK and US.

Hot Topics

The Latest

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...