Skip to main content

Down Goes the Internet (Again) – Part Two: 4 Strategies to Ensure Website Performance

Start with Part One of this article: Down Goes the Internet (Again) – Are You Ready?

In this era of unprecedented complexity, it's virtually impossible for a modern website to eliminate all the risk associated with using third parties. However, there are proactive strategies an organization can implement to better manage and minimize their risk. These include:

1. Proactively monitor speed and availability

Proactively monitor the speed and availability of websites, web applications and mobile sites from the true end-user perspective.

Today, there are so many elements out there on the web that stand between your data center and your users, including not just third-party services, but content delivery networks (CDNs), local and regional ISPs, mobile carrier networks and browsers, for example. Measuring performance from your data center alone is insufficient – unless, of course, your users live in your data center, which is highly unlikely.

The true browser-based perspective is the only place where you can accurately gauge your user's experience at the end of an extremely long and complicated technology path known as the application delivery chain. Today's new generation application performance management (APM) solutions are based on this true user perspective.

2. Monitor all transactions

Monitor all transactions, 24x7 along the complete application delivery chain. Sampling is not a sufficient means of gauging performance, of course, because a major performance issue may very well occur outside your testing interval – think of the Amazon EC2 outage that impacted Netflix on Christmas day last year!

Due to the unpredictability of major service outages, you need to be monitoring all transactions around the clock, to identify all performance aberrations and their root causes – both within and beyond the firewall – quickly and accurately, and get ahead of them.

3. Baseline and uphold performance-focused SLAs

Service-level agreements (SLAs) promising a certain level of availability on the part of a third-party service provider mean very little when it comes to performance.

For example, just because your cloud service provider's servers are up and running does not mean your users are experiencing an acceptable level of speed and reliability. Remember, third party services of all types are serving thousands of customers like you around the globe, and a spike in another customer's traffic may impact you.

With little insight into third party service providers' capacity planning decisions, you need to monitor performance levels yourself to ensure they don't drop off, and validate these against performance-focused SLAs. To get a sense of how a third party service provider may be impacting your overall performance, it can be helpful to compare your site's speed and availability before the third party service is added, to afterwards.

4. Utilize industry resources

Utilize industry resources to better assess if the source of a performance problem lies with you or one of your third-party service providers, as well as the likely performance impact on your customers.

These services may not prevent third party service outages from happening, but they can help companies better understand the source of performance problems so they can get in front of them more confidently and efficiently.

Conclusion

The reality is: the delivery chain underlying the services we often take for granted is so tenuous, that it's a marvel they don't break down more often. While outages may be inevitable, this does not make them any less costly or damaging to a company's reputation and revenues.

For example, on August 19, Amazon's North American retail site went down for about 49 minutes, with visitors greeted with the word “oops.” No explanation was given, but one estimate by Forbes put the cost to Amazon at nearly $2 million in sales.

But it's not just the “big guys” like Amazon that you need to focus on. The fact is that little storms are happening on the internet all the time, and you need to be prepared for them. When it comes to surviving and thriving in the age of increasing web complexity, an ounce of prevention can be worth a pound of cure. By taking advantage of several relatively simple and inexpensive approaches, organizations can better exploit all that third party services have to offer, while reducing the inherent risks.

Klaus Enzenhofer is Technology Strategist for Compuware APM’s Center of Excellence.

Down Goes the Internet (Again) – Part One: Are You Ready?

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

Down Goes the Internet (Again) – Part Two: 4 Strategies to Ensure Website Performance

Start with Part One of this article: Down Goes the Internet (Again) – Are You Ready?

In this era of unprecedented complexity, it's virtually impossible for a modern website to eliminate all the risk associated with using third parties. However, there are proactive strategies an organization can implement to better manage and minimize their risk. These include:

1. Proactively monitor speed and availability

Proactively monitor the speed and availability of websites, web applications and mobile sites from the true end-user perspective.

Today, there are so many elements out there on the web that stand between your data center and your users, including not just third-party services, but content delivery networks (CDNs), local and regional ISPs, mobile carrier networks and browsers, for example. Measuring performance from your data center alone is insufficient – unless, of course, your users live in your data center, which is highly unlikely.

The true browser-based perspective is the only place where you can accurately gauge your user's experience at the end of an extremely long and complicated technology path known as the application delivery chain. Today's new generation application performance management (APM) solutions are based on this true user perspective.

2. Monitor all transactions

Monitor all transactions, 24x7 along the complete application delivery chain. Sampling is not a sufficient means of gauging performance, of course, because a major performance issue may very well occur outside your testing interval – think of the Amazon EC2 outage that impacted Netflix on Christmas day last year!

Due to the unpredictability of major service outages, you need to be monitoring all transactions around the clock, to identify all performance aberrations and their root causes – both within and beyond the firewall – quickly and accurately, and get ahead of them.

3. Baseline and uphold performance-focused SLAs

Service-level agreements (SLAs) promising a certain level of availability on the part of a third-party service provider mean very little when it comes to performance.

For example, just because your cloud service provider's servers are up and running does not mean your users are experiencing an acceptable level of speed and reliability. Remember, third party services of all types are serving thousands of customers like you around the globe, and a spike in another customer's traffic may impact you.

With little insight into third party service providers' capacity planning decisions, you need to monitor performance levels yourself to ensure they don't drop off, and validate these against performance-focused SLAs. To get a sense of how a third party service provider may be impacting your overall performance, it can be helpful to compare your site's speed and availability before the third party service is added, to afterwards.

4. Utilize industry resources

Utilize industry resources to better assess if the source of a performance problem lies with you or one of your third-party service providers, as well as the likely performance impact on your customers.

These services may not prevent third party service outages from happening, but they can help companies better understand the source of performance problems so they can get in front of them more confidently and efficiently.

Conclusion

The reality is: the delivery chain underlying the services we often take for granted is so tenuous, that it's a marvel they don't break down more often. While outages may be inevitable, this does not make them any less costly or damaging to a company's reputation and revenues.

For example, on August 19, Amazon's North American retail site went down for about 49 minutes, with visitors greeted with the word “oops.” No explanation was given, but one estimate by Forbes put the cost to Amazon at nearly $2 million in sales.

But it's not just the “big guys” like Amazon that you need to focus on. The fact is that little storms are happening on the internet all the time, and you need to be prepared for them. When it comes to surviving and thriving in the age of increasing web complexity, an ounce of prevention can be worth a pound of cure. By taking advantage of several relatively simple and inexpensive approaches, organizations can better exploit all that third party services have to offer, while reducing the inherent risks.

Klaus Enzenhofer is Technology Strategist for Compuware APM’s Center of Excellence.

Down Goes the Internet (Again) – Part One: Are You Ready?

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...