Monitoring Alert: Don't Get Lost in the Clouds
November 28, 2018

Mehdi Daoudi
Catchpoint

Share this

The cloud is the technological megatrend of the new millennium, creating ease-of-use, efficiency and velocity for small businesses to large enterprises. But it was never meant to be the only answer for every situation. In the world of digital experience monitoring (DEM) — where the end user experience is paramount — cloud-based nodes, along with a variety of other node types, are used to build a view of the end user's digital experience. But major companies are now depending solely on cloud nodes for DEM. Research from Catchpoint, in addition to real-world customer data, shows this is a mistake.

Bottom line: if you want an accurate view of the end user experience, you can't monitor only from the cloud. And if you're using the cloud to monitor something also based in the cloud (like many customer-facing apps), you're compounding the problem. You can't expect an accurate last mile performance view by measuring a digital service from the same infrastructure in which it's located.

This is akin to the mistake many made in the early days of monitoring: tracking site performance by measuring only from the data center where the site was hosted. That's far too limited a perspective, given the multitude of performance-impacting elements beyond the firewall. Let's take a look at cloud-only monitoring limitations and how to effectively navigate them.

How Cloud-Only Monitoring Can Create Blind Spots

For example, last year a company received alerts that its services were down. After a mad scramble to fix the problem, it discovered the services were fine, and the alerts were caused by an outage on their cloud-based monitoring nodes! The end user experience was untouched. Good news, but also proof of the noise and false positives that can occur when you monitor from only one place, and in particular, from a cloud-only view.

This led to further research. One example was a series of synthetic monitoring tests on a single request to a website hosted on AWS's Washington DC data center. The test was run from cloud-only nodes on AWS, with parallel tests on synthetic monitoring nodes running in traditional internet data center backbone locations. The test was ran starting August 1, 2018 from seven different nodes — the Washington DC AWS data center, three backbone nodes in Washington DC, and three backbone nodes in New York, NY. This consisted of over 1.7 million measurements. Here are the results.


As you can see, the performance (response times) of tests run only from the cloud are faster by a significant margin. The median response time from the AWS node (bottom line, in orange) was 31ms, while the median response time from Level3's Washington DC backbone node was 117ms; and from Verizon's New York backbone node, 167ms. The cloud node measurement alone does not provide a realistic view of how end users are experiencing this particular site, and would lull an operations team into a false sense of security — not the kind of performance gap a retail website wants, particularly while we are in the critical holiday shopping season.

Why is this so? Tests run from the cloud on a cloud-located site enjoy some form of dedicated network connection as well as preferential data routing. Think of it like a VIP's cleared traffic route through a crowded city. This streamlined data path is far afield from that of an average end user, who receives his/her content after a long, circuitous route through ISPs, CDNs, wireless networks and various other pathways.

Applications Not Suitable for Cloud-Only Monitoring

Another way of explaining this: cloud-only monitoring does not track performance along the entire application delivery chain, nor does it provide the diagnostics required to manage that chain. Any single point along that path — ISPs for example — can create problems impacting the end user experience.

Important tracking processes not suitable for cloud-only monitoring may also include:

■ SLA measurements for third-parties along the delivery chain

■ Provider performance testing for services like CDNs, DNS, ad servers

■ Benchmarking for competitors in your industry

■ Network or ISP connectivity issues

■ DNS availability or validation of service

Where Cloud-Only Monitoring Is Beneficial

Of course, it's not all bad news. Cloud monitoring can provide valuable insights for certain applications such as:

■ Determining availability and performance of an application or service from within the cloud infrastructure environment

■ Performing first mile testing without deploying agents in physical locations

■ Testing some of the basic functionality and content of an application

■ Evaluating the latency of cloud providers back to your infrastructure

Conclusion and Best Practices

The key to avoiding the cloud-only DEM trap is to understand that the accuracy of your monitoring strategy depends on how your measurements are taken and from which locations. Cloud-based vantage points can be a valuable piece of the monitoring puzzle, but should not be relied upon as your sole monitoring infrastructure, as they won't be able to track the many network layers comprising the internet.

The answer will most likely be adding a blend of backbone, broadband, ISP, last mile and wireless monitoring. Start where your customers are located and work your way back along the delivery chain. By canvassing all the elements that can impact their experience you'll have the most accurate view of that experience, as well as the best opportunity to preempt performance problems before end users are affected.

Mehdi Daoudi is CEO and Co-Founder of Catchpoint
Share this

The Latest

March 31, 2020

Organizations face major infrastructure and security challenges in supporting multi-cloud and edge deployments, according to new global survey conducted by Propeller Insights for Volterra ...

March 30, 2020

Developers spend roughly 17.3 hours each week debugging, refactoring and modifying bad code — valuable time that could be spent writing more code, shipping better products and innovating. The bottom line? Nearly $300B (US) in lost developer productivity every year ...

March 26, 2020

While remote work policies have been gaining steam for the better part of the past decade across the enterprise space — driven in large part by more agile and scalable, cloud-delivered business solutions — recent events have pushed adoption into overdrive ...

March 25, 2020

Time-critical, unplanned work caused by IT disruptions continues to plague enterprises around the world, leading to lost revenue, significant employee morale problems and missed opportunities to innovate, according to the State of Unplanned Work Report 2020, conducted by Dimensional Research for PagerDuty ...

March 24, 2020

In today's iterative world, development teams care a lot more about how apps are running. There's a demand for fixing actionable items. Developers want to know exactly what's broken, what to fix right now, and what can wait. They want to know, "Do we build or fix?" This trade-off between building new features versus fixing bugs is one of the key factors behind the adoption of Application Stability management tools ...

March 23, 2020

With the rise of mobile apps and iterative development releases, Application Stability has answered the widespread need to monitor applications in a new way, shifting the focus from servers and networks to the customer experience. The emergence of Application Stability has caused some consternation for diehard APM fans. However, these two solutions embody very distinct monitoring focuses, which leads me to believe there's room for both tools, as well as different teams for both ...

March 19, 2020

The 2019 State of E-Commerce Infrastructure Report, from Webscale, analyzes findings from a comprehensive survey of more than 450 ecommerce professionals regarding how their online stores performed during the 2019 holiday season. Some key insights from the report include ...

March 18, 2020

Robinhood is a unicorn startup that has been disrupting the way by which many millennials have been investing and managing their money for the past few years. For Robinhood, the burden of proof was to show that they can provide an infrastructure that is as scalable, reliable and secure as that of major banks who have been developing their trading infrastructure for the last quarter-century. That promise fell flat last week, when the market volatility brought about a set of edge cases that brought Robinhood's trading app to its knees ...

March 17, 2020

Application backend monitoring is the key to acquiring visibility across the enterprise's application stack, from the application layer and underlying infrastructure to third-party API services, web servers and databases, be they on-premises, in a public or private cloud, or in a hybrid model. By tracking and reporting performance in real time, IT teams can ensure applications perform at peak efficiency — and guarantee a seamless customer experience. How can IT operations teams improve application backend monitoring? By embracing artificial intelligence for operations — AIOps ...

March 16, 2020

In 2020, DevOps teams will face heightened expectations for higher speed and frequency of code delivery, which means their IT environments will become even more modular, ephemeral and dynamic — and significantly more complicated to monitor. As a result, AIOps will further cement its position as the most effective technology that DevOps teams can use to see and control what's going on with their applications and their underlying infrastructure, so that they can prevent outages. Here I outline five key trends to watch related to how AIOps will impact DevOps in 2020 and beyond ...