Preventing Outages in 2023: What We Can Learn from Recent Failures
February 09, 2023
Share this

"What the recent failures from Internet giants demonstrate is that the question of the next outage is not if, but when," says Dritan Suljoti, Chief Product and Technology Officer of Catchpoint, referencing the company's new white paper, Preventing Outages in 2023: What We Learned from Recent Failures. "Moreover, the downstream effect of major outages to essential Internet infrastructure, such as cloud platforms, CDNs or DNS providers, means that no company is immune, no matter how well prepared they think they are. The white paper demonstrates why it's so important for all of us to be proactive to reduce Mean Time to Repair (MTTR) when the next outage occurs."



Key lessons from the past

■ Develop an Internet Performance Monitoring strategy that allows you to monitor precisely what customers, workforce, and other users expect and build an Experience Score.

■ Monitor not only what is under your direct control, map your Internet stack to ensure you are monitoring every component of the Internet Stack relied on to deliver your content (including DNS, CDN, ISP, BGP, TCP configuration, SSL, and other cloud services, etc.).

■ Automate intelligently – design and test automation to ensure there are no bugs hiding in the code.

■ Be prepared to take fast action to remediate outages as they occur, for example, switching to a backup solution or dropping the third-party causing the issue. Develop runbooks and practice recovery.

■ Whenever change is scheduled, ensure your team is ready for any outages that may occur (intentionally or not) with a crisis call plan that includes a communication plan and templates, a plan to mitigate failures from third-parties, and a best practices monitoring and observability plan.

"Given the impact of serious outages to the bottom line, not to mention the long-tail impact to brand and reputation, amidst a landscape of increased Internet reliance alongside ever-growing Internet fragility and greater and great complexity, the need for community learnings from past failures to be shared and practical advice disseminated around stemming future major incidents and ensuring Internet Resilience is imperative," says Gerardo Dada, CMO at Catchpoint. "We believe this white paper offers an invaluable deep dive into recent outages past and key lessons learned that all of us can learn from to prevent (or mitigate the consequences of) the next major outage."

Share this

The Latest

March 04, 2024

This year's Super Bowl drew in viewership of nearly 124 million viewers and made history as the most-watched live broadcast event since the 1969 moon landing. To support this spike in viewership, streaming companies like YouTube TV, Hulu and Paramount+ began preparing their IT infrastructure months in advance to ensure an exceptional viewer experience without outages or major interruptions. New Relic conducted a survey to understand the importance of a seamless viewing experience and the impact of outages during major streaming events such as the Super Bowl ...

March 01, 2024

As organizations continue to navigate the complexities of the digital era, which has been marked by exponential advancements in AI and technology, the strategic deployment of modern, practical applications has become indispensable for sustaining competitive advantage and realizing business goals. The Info-Tech Research Group report, Applications Priorities 2024, explores the following five initiatives for emerging and leading-edge technologies and practices that can enable IT and applications leaders to optimize their application portfolio and improve on capabilities needed to meet the ambitions of their organizations ...

February 29, 2024

Despite the growth in popularity of artificial intelligence (AI) and ML across a number of industries, there is still a huge amount of unrealized potential, with many businesses playing catch-up and still planning how ML solutions can best facilitate processes. Further progression could be limited without investment in specialized technical teams to drive development and integration ...

February 28, 2024

With over 200 streaming services to choose from, including multiple platforms featuring similar types of entertainment, users have little incentive to remain loyal to any given platform if it exhibits performance issues. Big names in streaming like Hulu, Amazon Prime and HBO Max invest thousands of hours into engineering observability and closed-loop monitoring to combat infrastructure and application issues, but smaller platforms struggle to remain competitive without access to the same resources ...

February 27, 2024

Generative AI has recently experienced unprecedented dramatic growth, making it one of the most exciting transformations the tech industry has seen in some time. However, this growth also poses a challenge for tech leaders who will be expected to deliver on the promise of new technology. In 2024, delivering tangible outcomes that meet the potential of AI, and setting up incubator projects for the future will be key tasks ...

February 26, 2024

SAP is a tool for automating business processes. Managing SAP solutions, especially with the shift to the cloud-based S/4HANA platform, can be intricate. To explore the concerns of SAP users during operational transformations and automation, a survey was conducted in mid-2023 by Digitate and Americas' SAP Users' Group ...

February 22, 2024

Some companies are just starting to dip their toes into developing AI capabilities, while (few) others can claim they have built a truly AI-first product. Regardless of where a company is on the AI journey, leaders must understand what it means to build every aspect of their product with AI in mind ...

February 21, 2024

Generative AI will usher in advantages within various industries. However, the technology is still nascent, and according to the recent Dynatrace survey there are many challenges and risks that organizations need to overcome to use this technology effectively ...

February 20, 2024

In today's digital era, monitoring and observability are indispensable in software and application development. Their efficacy lies in empowering developers to swiftly identify and address issues, enhance performance, and deliver flawless user experiences. Achieving these objectives requires meticulous planning, strategic implementation, and consistent ongoing maintenance. In this blog, we're sharing our five best practices to fortify your approach to application performance monitoring (APM) and observability ...

February 16, 2024

In MEAN TIME TO INSIGHT Episode 3, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses network security with Chris Steffen, VP of Research Covering Information Security, Risk, and Compliance Management at EMA ...