Skip to main content

Organizations Can Lose $1M+ Per Hour During Unplanned Disruptions

The financial stakes of extended service disruption has made operational resilience a top priority, according to 2026 State of AI-First Operations Report, a report from PagerDuty.

According to survey findings, 95% of respondents believe their leadership understands the competitive advantage that can be gained from reducing incidents and speeding recovery.

The report also shows that organizations are increasingly considering the adoption of AI for digital operations, with 59% indicating they actively incorporate the technology into operations. The AI adopters appear to be experiencing more success than those who may have discussed it but have not yet incorporated it: 75% report improved operational resilience, compared to only 66% of organizations that improved operational resilience but are not yet using AI.

Additional key takeaways from the report include:

Disruptions have become a board-level financial risk

Some organizations (8%) lose more than $1 million per hour, 34% lose at least $500,000 per hour, and more than two thirds (68%) lose more than $300,000 per hour during IT incidents. The cost of disruptions have grown too high for leaders to ignore and the impact extends beyond immediate revenue loss to damaging brand reputation (52%), introducing recovery costs (50%), reducing productivity (48%) and contributing to developer burnout (42%).

Successful organizations prioritize investments in operational resilience

A majority of organizations have made strides from their investments in the past year, with 71% reporting higher resilience and maturity than a year ago. However, progress appears to vary based on two key factors: business performance and investment. While 77% of organizations plan to increase operational resilience budgets over the next 12 months, companies reporting revenue growth are investing at significantly higher rates (82%) than underperformers (62%).

Post-incident learning capabilities gain recognition

Organizations that reported improved resilience most often attributed this progress to tools that combine integration with learning capabilities. Nearly half of organizations (48%) have increased resilience by turning incidents into structured learning opportunities to improve future performance. Successful companies with revenue growth are more likely to see a massive or moderate need for continuous learning (83%) than companies with flat or decreased revenue (77%). This suggests that the most successful platforms will be those that can transform incidents into systematic improvement cycles.

"The 2026 PagerDuty State of AI-First Operations Report further demonstrates how the financial risk of major incidents makes operational resilience a board-level priority," said Katherine Calvert, chief marketing officer at PagerDuty. "AI-first operations enable organizations to accelerate their incident management workflows so they can restore service more quickly during disruption. With PagerDuty, organizations can not only minimize risk, but cut down on teams’ time spent firefighting so they can focus on driving innovation and revenue."

Methodology: The report draws insights based on survey responses from 1,000 business leaders, IT decision makers and senior developers across Australia and New Zealand, France, Germany, Japan, the Nordic countries, the UK and Ireland, and the US.

Hot Topics

The Latest

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Organizations Can Lose $1M+ Per Hour During Unplanned Disruptions

The financial stakes of extended service disruption has made operational resilience a top priority, according to 2026 State of AI-First Operations Report, a report from PagerDuty.

According to survey findings, 95% of respondents believe their leadership understands the competitive advantage that can be gained from reducing incidents and speeding recovery.

The report also shows that organizations are increasingly considering the adoption of AI for digital operations, with 59% indicating they actively incorporate the technology into operations. The AI adopters appear to be experiencing more success than those who may have discussed it but have not yet incorporated it: 75% report improved operational resilience, compared to only 66% of organizations that improved operational resilience but are not yet using AI.

Additional key takeaways from the report include:

Disruptions have become a board-level financial risk

Some organizations (8%) lose more than $1 million per hour, 34% lose at least $500,000 per hour, and more than two thirds (68%) lose more than $300,000 per hour during IT incidents. The cost of disruptions have grown too high for leaders to ignore and the impact extends beyond immediate revenue loss to damaging brand reputation (52%), introducing recovery costs (50%), reducing productivity (48%) and contributing to developer burnout (42%).

Successful organizations prioritize investments in operational resilience

A majority of organizations have made strides from their investments in the past year, with 71% reporting higher resilience and maturity than a year ago. However, progress appears to vary based on two key factors: business performance and investment. While 77% of organizations plan to increase operational resilience budgets over the next 12 months, companies reporting revenue growth are investing at significantly higher rates (82%) than underperformers (62%).

Post-incident learning capabilities gain recognition

Organizations that reported improved resilience most often attributed this progress to tools that combine integration with learning capabilities. Nearly half of organizations (48%) have increased resilience by turning incidents into structured learning opportunities to improve future performance. Successful companies with revenue growth are more likely to see a massive or moderate need for continuous learning (83%) than companies with flat or decreased revenue (77%). This suggests that the most successful platforms will be those that can transform incidents into systematic improvement cycles.

"The 2026 PagerDuty State of AI-First Operations Report further demonstrates how the financial risk of major incidents makes operational resilience a board-level priority," said Katherine Calvert, chief marketing officer at PagerDuty. "AI-first operations enable organizations to accelerate their incident management workflows so they can restore service more quickly during disruption. With PagerDuty, organizations can not only minimize risk, but cut down on teams’ time spent firefighting so they can focus on driving innovation and revenue."

Methodology: The report draws insights based on survey responses from 1,000 business leaders, IT decision makers and senior developers across Australia and New Zealand, France, Germany, Japan, the Nordic countries, the UK and Ireland, and the US.

Hot Topics

The Latest

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...