Skip to main content

Enterprise Challenge: Balancing Between Driving Innovation and Maintaining Uptime

Tobias Dunn-Krahn
xMatters

Customer-impacting service disruptions can cost enterprises revenue and reputation. As businesses progress towards digitalization, maintaining an excellent customer experience has become a critical measure of an organization’s digital transformation success. For digital service providers, this requires modern architectures and new expectations for the way engineers, customer teams and business leaders work together. Responsibility for the customer experience is extended to multiple teams across technology organizations. 

xMatters recently released the results of its Incident Management in the Age of Customer-Centricity research study to better understand the range of various incident management practices and how the increased focus on customer experience has caused roles across an organization to evolve. The study asked the opinions of over 300 DevOps and IT Ops practitioners and business leaders from organizations of varying sizes, including midsize and enterprise-level businesses, delivering digital services.

Findings highlight the ongoing challenges organizations face as they continue to introduce and rapidly evolve digital services. The research also found the importance of intelligent, automated approaches to simultaneously reduce incidents and to limit their impact when they do arise.

Maintaining the Pace of Innovation and the Customer Experience

The research found a gap that needs to be closed if organizations hope to continuously innovate and maintain service performance and availability. More than half of respondents (54%) said their organization delivers at least one new software release per week and a full 77% of respondents said the number of releases has increased by at least 25% over the past three years.

Unfortunately, legacy technology and overburdened talent is straining to keep up. For example, 57% of organizations report their customers experience a degradation in digital experiences, ranging from minor performance issues to major outages, on a daily or weekly basis. 

Nearly 75% of survey respondents say that their ability to build out new services is sometimes or always affected by customer-impacting issues. This gap between the demand for new services and the need to provide an always-on, superior customer experience must be solved if the dream of the digitalization of business is to be realized.

Inefficient Incident Management Slows the Pace of Innovation

The vast majority of survey respondents (91.7%) representing myriad roles said that delivering a superior customer experience is a priority for them.

Previously, IT Ops alone was the group most commonly identified as being responsible for enabling the customer experience. This shift is important, as now nearly everyone across a technology enterprise shoulders part of the load and much of their time is lost to problem triage and working toward eventual resolution.

According to the survey, nearly half of development team leads indicated their developers spend more than 50% of their time manually addressing incidents. Already a huge concern, this sunk time is only going to become more painful as the pace of innovation continues to quicken.

Automation Helps Streamline Incident Management

There is reason for optimism in the survey results, too. Findings indicate that the modernization of incident management practices, including the use of more advanced IT tools and services, will dramatically aid in resolving issues at a faster pace through automation and by equipping enterprise employees with the information and resources needed to support digital transformation.

The majority of DevOps/SRE practitioners (84%), IT Operations practitioners (73%) and Developers (65%) surveyed believe emerging technologies like AI and ML will further improve their job performance.

These new advancements in incident management will aid companies as they continue to deliver quality services at a higher rate of speed. While the gap between an organization’s ability to innovate and maintain uptime is revealing, the modernization of incident management will equip employees to better serve their customers with critical insights and allow them to focus on service innovation.

Tobias Dunn-Krahn is CTO of xMatters

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

Enterprise Challenge: Balancing Between Driving Innovation and Maintaining Uptime

Tobias Dunn-Krahn
xMatters

Customer-impacting service disruptions can cost enterprises revenue and reputation. As businesses progress towards digitalization, maintaining an excellent customer experience has become a critical measure of an organization’s digital transformation success. For digital service providers, this requires modern architectures and new expectations for the way engineers, customer teams and business leaders work together. Responsibility for the customer experience is extended to multiple teams across technology organizations. 

xMatters recently released the results of its Incident Management in the Age of Customer-Centricity research study to better understand the range of various incident management practices and how the increased focus on customer experience has caused roles across an organization to evolve. The study asked the opinions of over 300 DevOps and IT Ops practitioners and business leaders from organizations of varying sizes, including midsize and enterprise-level businesses, delivering digital services.

Findings highlight the ongoing challenges organizations face as they continue to introduce and rapidly evolve digital services. The research also found the importance of intelligent, automated approaches to simultaneously reduce incidents and to limit their impact when they do arise.

Maintaining the Pace of Innovation and the Customer Experience

The research found a gap that needs to be closed if organizations hope to continuously innovate and maintain service performance and availability. More than half of respondents (54%) said their organization delivers at least one new software release per week and a full 77% of respondents said the number of releases has increased by at least 25% over the past three years.

Unfortunately, legacy technology and overburdened talent is straining to keep up. For example, 57% of organizations report their customers experience a degradation in digital experiences, ranging from minor performance issues to major outages, on a daily or weekly basis. 

Nearly 75% of survey respondents say that their ability to build out new services is sometimes or always affected by customer-impacting issues. This gap between the demand for new services and the need to provide an always-on, superior customer experience must be solved if the dream of the digitalization of business is to be realized.

Inefficient Incident Management Slows the Pace of Innovation

The vast majority of survey respondents (91.7%) representing myriad roles said that delivering a superior customer experience is a priority for them.

Previously, IT Ops alone was the group most commonly identified as being responsible for enabling the customer experience. This shift is important, as now nearly everyone across a technology enterprise shoulders part of the load and much of their time is lost to problem triage and working toward eventual resolution.

According to the survey, nearly half of development team leads indicated their developers spend more than 50% of their time manually addressing incidents. Already a huge concern, this sunk time is only going to become more painful as the pace of innovation continues to quicken.

Automation Helps Streamline Incident Management

There is reason for optimism in the survey results, too. Findings indicate that the modernization of incident management practices, including the use of more advanced IT tools and services, will dramatically aid in resolving issues at a faster pace through automation and by equipping enterprise employees with the information and resources needed to support digital transformation.

The majority of DevOps/SRE practitioners (84%), IT Operations practitioners (73%) and Developers (65%) surveyed believe emerging technologies like AI and ML will further improve their job performance.

These new advancements in incident management will aid companies as they continue to deliver quality services at a higher rate of speed. While the gap between an organization’s ability to innovate and maintain uptime is revealing, the modernization of incident management will equip employees to better serve their customers with critical insights and allow them to focus on service innovation.

Tobias Dunn-Krahn is CTO of xMatters

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...