APM vs Monitoring in Cloud-Native Environments: Reject the False Dichotomy
August 24, 2018

Apurva Davé
Sysdig

Share this

Ask anyone who's managed software in production: Management tools have many useful attributes, but no single tool gives you everything you need. Oh sure, a new interface comes along and handles an emerging use case beautifully – for a while. But requirements inevitably change and new variables get added to the equation. You add, upgrade or increase the complexity as needed.

This is a familiar arc for developers, IT pros and anyone who manages applications and their underlying infrastructure. And the story is no different when you look at observability tools like application performance management (APM).

For DevOps professionals, the advent of cloud-native systems and X-as-a-service has exposed the limitations of traditional APM tools. Most APM tools were designed to instrument and visualize simpler, static monoliths, and focused on the application layer to visualize traces of individual transactions. The fact is, APM is still sorely needed for developers, but it is not a panacea when it comes to understanding the overall performance of your application.

With cloud native computing, you may have dozens of microservices and hundreds or thousands of short-lived containers spread across multiple clouds. The efficiency of microservices is great for developer agility, but microservice architectures have also complicated the job of the operations team to ensure the performance, uptime and security of their systems.

In this new world, DevOps is finding it needs a broader range of functionality to truly understand system performance and potential issues. That functionality includes:

■ Collection of high frequency, high cardinality metrics across all containers, applications, and microservices. This data is typically stored over a long time to enable trending, yet is becoming more complex in today’s systems

■ Correlation of metrics with events (like a Kubernetes scaling event or a code push)

■ Capture of deep troubleshooting information like logs or system calls to derive a root cause issue in both the application and/or the infrastructure

■ Tracing key transactions through the call stack

A New Breed of Monitoring

With this broad range of requirements, it is easy to see that one system is unlikely to serve all of these needs well. And that has led to wider adoption of a new breed of cloud-native IT infrastructure monitoring (ITIM), a device- or capability-oriented approach that focuses on drawing a link between your applications, microservices, and the underlying infrastructure.

According to Gregg Siegfried from Gartner, "IT Infrastructure monitoring has always been difficult to do well. Cloud platforms, containers and changing software architecture have only increased the challenges." (Gartner, "Monitoring Modern Services and Infrastructure" by Gregg Siegfried on 15 March 2018)

Cloud-native systems have radically increased the need for dynamic metric systems. In addition, organizations that need high-volume, high cardinality metrics (think Facebook or Netflix) used to be the exception, but they are now becoming commonplace across enterprises of all sizes. APM by itself can't meet the needs of these new systems.

As a result, organizations are adopting APM and ITIM alongside each other. Critical management criteria align with different monitoring tools. Performance metrics are associated with ITIM; tracing is aligned with APM; logging is part of incident and event management. While there is some overlap, if we look at their core functionality there is far more differentiation than repetition.

APM typically works with heavyweight instrumentation inside your application code, giving you a detailed look at how the code written by your developers is performing. That’s extremely valuable, especially when developers are debugging their code in test before it goes into production. Unfortunately, APM also abstracts away the underlying containers, hosts, and network infrastructure. That's not an issue for developers since they only need to worry about the code they wrote, but operations professionals must consider the entire stack, and have something resource-efficient enough to actually deploy across everything in production.

In contrast, a modern, cloud-native ITIM monitoring system doesn’t instrument your code. But it will give you system visibility by instrumenting all the hosts in your environment and give you visibility into networks (physical and software-defined), as well as hosts, containers, processes, base application metrics, and developer-provided custom metrics like Prometheus, statsd and JMX.

Scale is also a very different challenge for any implementation using ITIM. APM was not designed for high frequency, high cardinality, multi-dimensional metrics, but modern ITIM was conceived with massive scale and a need to recompute metrics on the fly based on high cardinality metadata. Your ITIM tool should enable you to store all the metrics in a raw form, and recompute the answers to questions on the fly - an essential.

With this rich functionality, cloud-native ITIM monitoring systems give you a powerful view of overall system performance, especially where your applications are interacting with underlying systems.

But again, for most organizations this isn't an either-or situation. You might eliminate your APM tool if you have absolute faith nothing will ever go wrong with your application code. Or if you're extremely confident your infrastructure, container, and orchestration tooling will always perform as expected. But most DevOps professionals will see through this false dichotomy and use some combination of these tools to ensure performance, reliability and security. And if your organization is focused on the fastest mean time to resolution (MTTR) as a performance metric, it's best to have both systems in place.

Apurva Davé is VP of Marketing at Sysdig
Share this

The Latest

July 16, 2020
eCommerce companies have entered into the domain of mobile applications given the huge number of customers using such apps on their smartphones on the go. However, these apps are vulnerable to both performance and security issues. Performance-wise, the apps may slow down while loading or transacting, give erroneous counts, become non-responsive across devices, and many more. So, the need of the hour for enterprises developing such applications is to invest in eCommerce performance testing ...
July 15, 2020

Digital Experience Monitoring is a tool that should be integrated with an organization's change management strategy. A key benefit of SaaS/cloud is no longer being responsible for software and hardware upgrades, maintenance, and patch cycles. Migrating to Microsoft Office 365 means no longer spending precious time and resources on Windows, Exchange or SharePoint upgrades for example. But that doesn't mean that IT can ignore changes or doesn't need to monitor for their effects ...

July 14, 2020

As systems become more complex and IT loses direct control of infrastructure (hello cloud), it becomes both more difficult and more important to capture and observe, holistically, the user experience. SaaS or cloud apps like Salesforce, Microsoft Office 365, and Workday have become mission-critical to most businesses and therefore need to be examined when it comes to experience monitoring ...

July 13, 2020

Newly distributed operations teams are struggling to cope with the sudden change to the WFH (work from home) concept. IT operations teams were traditionally set up to work from centralized locations, unlike software and engineering teams. Some organizations have overcome that by implementing AIOps solutions; others are using a brute force method of employing more IT operations analysts to keep the distributed NOCs going ...

July 09, 2020

Enterprises that halted their cloud migration journey during the current global pandemic are two and a half times more likely than those that continued their move to the cloud to have experienced IT outages that negatively impacted their SLAs, according to Virtana's latest survey report The Current State of Hybrid Cloud and IT ...

July 08, 2020

Every business has the responsibility to do their part against climate change by reducing their carbon footprint while increasing sustainability and efficiency. Harnessing optimization of IT infrastructure is one method companies can use to reduce carbon footprint, improve sustainability and increase business efficiency, while also keeping costs down ...

July 07, 2020

While the adoption of continuous integration (CI) is on the rise, software engineering teams are unable to take a zero-tolerance approach to software failures, costing enterprise organizations billions annually, according to a quantitative study conducted by Undo and a Cambridge Judge Business School MBA project ...

June 25, 2020

I've had the opportunity to work with a number of organizations embarking on their AIOps journey. I always advise them to start by evaluating their needs and the possibilities AIOps can bring to them through five different levels of AIOps maturity. This is a strategic approach that allows enterprises to achieve complete automation for long-term success ...

June 24, 2020

Sumo Logic recently commissioned an independent market research study to understand the industry momentum behind continuous intelligence — and the necessity for digital organizations to embrace a cloud-native, real-time continuous intelligence platform to support the speed and agility of business for faster decision-making, optimizing security, driving new innovation and delivering world-class customer experiences. Some of the key findings include ...

June 23, 2020

When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many ...