Skip to main content

Millions Lost to Internet Outages: Could a C-Suite Role Help Stem the Tide?

Mehdi Daoudi
Catchpoint

The consequences of outages have become a pressing issue as the largest IT outage in history continues to rock the world with severe ramifications. It has been estimated that this latest outage cost Fortune 500 companies as much as $5.4 billion in revenues and gross profit with Delta most recently confirming $380 million in revenue alone. According to the Catchpoint Internet Resilience Report, these types of disruptions, internet outages in particular, can have severe financial and reputational impacts and enterprises should strongly consider their resilience.

This isn't just an issue impacting companies using CrowdStrike's software, but is one that is costing companies millions across the board. The Internet Resilience Report revealed that 43% of surveyed businesses in sectors including finance, e-commerce, cloud, and healthcare estimated losses of over $1 million due to internet outages or degradations in the month prior to the 2024 survey.

In today's interconnected world, a single point of failure in internet infrastructure can translate directly into substantial revenue losses. Thus, a top-down approach to internet resilience is needed. Companies should consider the establishment of a chief resilience officer (CRO) within the C-suite. This role is akin to that of a Chief Security Officer, emphasizing the importance of resilience alongside security. One of the primary causes of frequent outages is the lack of centralized and unified monitoring tools, resulting in a fragmented IT landscape reminiscent of the Balkans. The CRO should be responsible for driving the standardization of telemetry across the organization to enhance resilience. As the report highlights, the financial and reputational consequences of inadequate resilience are as severe as those of security breaches. Therefore, it is imperative that companies prioritize resilience at the highest levels of their organization.

In fact, Fortune 2000 companies are leading the charge in the new trend and increasingly recognizing the value of the CRO role. These executives are tasked with driving resilience planning, identifying single points of failure, and devising strategies to mitigate potential disruptions. The extensive Adobe Experience Cloud outage last year, which lasted 18 hours (in addition to the recent CrowdStrike outage), serves as a stark example of the type of service disruption that a CRO could help manage and prevent.

However, the creation of a CRO position is not the only path to achieving resilience. Organizations should also foster a culture of resilience by learning from their mistakes by documenting and studying failures within the product delivery chain and encouraging a mindset of continuous improvement. Companies should conduct preemptive exercises to test their systems, identifying weaknesses and refining their responses to potential outages.

Moreover, it is crucial for businesses to work with reliable vendors who demonstrate a commitment to resilience. While everyone is allowed to make mistakes, repeated failures or a lack of accountability should prompt companies to reconsider their partnerships. Learning from each incident and ensuring that vendors do the same is key to maintaining a resilient internet infrastructure.

As we navigate our increasingly digital-first world, the importance of internet resilience cannot be overstated. It should be an integral part of any disaster recovery or business continuity program, discussed at the highest organizational levels and tested regularly. While we can't simulate every possible outage scenario, planning for the unexpected has become a crucial business practice.

Prioritizing internet resilience and taking resiliency into consideration from the c-suite down is essential for any business aiming to thrive amidst the complexities of our connected landscape.

Mehdi Daoudi is CEO and Co-Founder of Catchpoint

The Latest

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Millions Lost to Internet Outages: Could a C-Suite Role Help Stem the Tide?

Mehdi Daoudi
Catchpoint

The consequences of outages have become a pressing issue as the largest IT outage in history continues to rock the world with severe ramifications. It has been estimated that this latest outage cost Fortune 500 companies as much as $5.4 billion in revenues and gross profit with Delta most recently confirming $380 million in revenue alone. According to the Catchpoint Internet Resilience Report, these types of disruptions, internet outages in particular, can have severe financial and reputational impacts and enterprises should strongly consider their resilience.

This isn't just an issue impacting companies using CrowdStrike's software, but is one that is costing companies millions across the board. The Internet Resilience Report revealed that 43% of surveyed businesses in sectors including finance, e-commerce, cloud, and healthcare estimated losses of over $1 million due to internet outages or degradations in the month prior to the 2024 survey.

In today's interconnected world, a single point of failure in internet infrastructure can translate directly into substantial revenue losses. Thus, a top-down approach to internet resilience is needed. Companies should consider the establishment of a chief resilience officer (CRO) within the C-suite. This role is akin to that of a Chief Security Officer, emphasizing the importance of resilience alongside security. One of the primary causes of frequent outages is the lack of centralized and unified monitoring tools, resulting in a fragmented IT landscape reminiscent of the Balkans. The CRO should be responsible for driving the standardization of telemetry across the organization to enhance resilience. As the report highlights, the financial and reputational consequences of inadequate resilience are as severe as those of security breaches. Therefore, it is imperative that companies prioritize resilience at the highest levels of their organization.

In fact, Fortune 2000 companies are leading the charge in the new trend and increasingly recognizing the value of the CRO role. These executives are tasked with driving resilience planning, identifying single points of failure, and devising strategies to mitigate potential disruptions. The extensive Adobe Experience Cloud outage last year, which lasted 18 hours (in addition to the recent CrowdStrike outage), serves as a stark example of the type of service disruption that a CRO could help manage and prevent.

However, the creation of a CRO position is not the only path to achieving resilience. Organizations should also foster a culture of resilience by learning from their mistakes by documenting and studying failures within the product delivery chain and encouraging a mindset of continuous improvement. Companies should conduct preemptive exercises to test their systems, identifying weaknesses and refining their responses to potential outages.

Moreover, it is crucial for businesses to work with reliable vendors who demonstrate a commitment to resilience. While everyone is allowed to make mistakes, repeated failures or a lack of accountability should prompt companies to reconsider their partnerships. Learning from each incident and ensuring that vendors do the same is key to maintaining a resilient internet infrastructure.

As we navigate our increasingly digital-first world, the importance of internet resilience cannot be overstated. It should be an integral part of any disaster recovery or business continuity program, discussed at the highest organizational levels and tested regularly. While we can't simulate every possible outage scenario, planning for the unexpected has become a crucial business practice.

Prioritizing internet resilience and taking resiliency into consideration from the c-suite down is essential for any business aiming to thrive amidst the complexities of our connected landscape.

Mehdi Daoudi is CEO and Co-Founder of Catchpoint

The Latest

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...