Skip to main content

State of Observability 2021: Early Investments in Observability Improve Performance, Customer Experience and Bottom Line

With every organization now being a digital organization, observability should be viewed as a core competency, not a cutting-edge differentiator, according to The State of Observability 2021, a report from Splunk in collaboration with Enterprise Strategy Group.

The research finds that observability delivers tangible, essential results and high maturity observability practices are correlated with:

■ Much greater visibility across hybrid, multi-cloud infrastructures, resources and performance areas. Mature observability users are 2.9 times as likely to report better visibility into application performance and enjoy almost 2 times better visibility into public cloud infrastructure.

■ Accelerated root cause identification, meaning complex, service-crashing crises are fixed much more quickly, or averted entirely. Leaders are 6.1 times likelier to have accelerated root cause identification (43% of leaders versus 7% of beginners).

■ Faster digital transformation, with more successful results. Organizations with the most advanced observability practices are 4.5 times more likely to report successful digital transformation initiatives.

■ Exploding innovation, with leaders reporting 60% more new services, products and revenue streams than organizations with beginner-level observability.

"The pandemic accelerated digital transformations this past year and observability simply is no longer optional in a real-time economy where multicloud complexity has become standard," said Sendur Sellakumar, SVP, Cloud and Chief Product Officer, Splunk. "Having a robust observability practice means fewer service disruptions, better customer experiences and more successful digital transformations. Observability means full fidelity data visibility not only at the infrastructure level, but also at the application and service level, with end-to-end transaction visibility no matter the technologies involved."

A significant percentage of respondents also say they have suffered material consequences for service failures that better observability practices could have prevented:

■ Lower customer satisfaction (45%)

■ Loss of revenue (37%)

■ Loss of reputation (36%)

■ Loss of customers (30%)

Additionally, gaps in observability hurt the bottom line and customer satisfaction:

■ 53% of leaders reported that app issues have resulted in customer or revenue loss.

■ 45% reported lower customer satisfaction as a result of service failures.

■ 30% reported losing customers as a consequence.

The report also highlights concrete recommendations for organizations as they look to improve their observability practices, including prioritizing data collection and correlation, as well as making use of AI, ML and automation.

Methodology: The global survey was conducted from mid-February through mid-March 2021 and in partnership with the Enterprise Strategy Group. The 525 respondents, IT and ITOps leaders and practitioners, were drawn from nine global regions and from organizations with more than 500 employees and an existing observability practice.

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

State of Observability 2021: Early Investments in Observability Improve Performance, Customer Experience and Bottom Line

With every organization now being a digital organization, observability should be viewed as a core competency, not a cutting-edge differentiator, according to The State of Observability 2021, a report from Splunk in collaboration with Enterprise Strategy Group.

The research finds that observability delivers tangible, essential results and high maturity observability practices are correlated with:

■ Much greater visibility across hybrid, multi-cloud infrastructures, resources and performance areas. Mature observability users are 2.9 times as likely to report better visibility into application performance and enjoy almost 2 times better visibility into public cloud infrastructure.

■ Accelerated root cause identification, meaning complex, service-crashing crises are fixed much more quickly, or averted entirely. Leaders are 6.1 times likelier to have accelerated root cause identification (43% of leaders versus 7% of beginners).

■ Faster digital transformation, with more successful results. Organizations with the most advanced observability practices are 4.5 times more likely to report successful digital transformation initiatives.

■ Exploding innovation, with leaders reporting 60% more new services, products and revenue streams than organizations with beginner-level observability.

"The pandemic accelerated digital transformations this past year and observability simply is no longer optional in a real-time economy where multicloud complexity has become standard," said Sendur Sellakumar, SVP, Cloud and Chief Product Officer, Splunk. "Having a robust observability practice means fewer service disruptions, better customer experiences and more successful digital transformations. Observability means full fidelity data visibility not only at the infrastructure level, but also at the application and service level, with end-to-end transaction visibility no matter the technologies involved."

A significant percentage of respondents also say they have suffered material consequences for service failures that better observability practices could have prevented:

■ Lower customer satisfaction (45%)

■ Loss of revenue (37%)

■ Loss of reputation (36%)

■ Loss of customers (30%)

Additionally, gaps in observability hurt the bottom line and customer satisfaction:

■ 53% of leaders reported that app issues have resulted in customer or revenue loss.

■ 45% reported lower customer satisfaction as a result of service failures.

■ 30% reported losing customers as a consequence.

The report also highlights concrete recommendations for organizations as they look to improve their observability practices, including prioritizing data collection and correlation, as well as making use of AI, ML and automation.

Methodology: The global survey was conducted from mid-February through mid-March 2021 and in partnership with the Enterprise Strategy Group. The 525 respondents, IT and ITOps leaders and practitioners, were drawn from nine global regions and from organizations with more than 500 employees and an existing observability practice.

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...