APM vs Monitoring in Cloud-Native Environments: Reject the False Dichotomy
August 24, 2018

Apurva Davé
Sysdig

Share this

Ask anyone who's managed software in production: Management tools have many useful attributes, but no single tool gives you everything you need. Oh sure, a new interface comes along and handles an emerging use case beautifully – for a while. But requirements inevitably change and new variables get added to the equation. You add, upgrade or increase the complexity as needed.

This is a familiar arc for developers, IT pros and anyone who manages applications and their underlying infrastructure. And the story is no different when you look at observability tools like application performance management (APM).

For DevOps professionals, the advent of cloud-native systems and X-as-a-service has exposed the limitations of traditional APM tools. Most APM tools were designed to instrument and visualize simpler, static monoliths, and focused on the application layer to visualize traces of individual transactions. The fact is, APM is still sorely needed for developers, but it is not a panacea when it comes to understanding the overall performance of your application.

With cloud native computing, you may have dozens of microservices and hundreds or thousands of short-lived containers spread across multiple clouds. The efficiency of microservices is great for developer agility, but microservice architectures have also complicated the job of the operations team to ensure the performance, uptime and security of their systems.

In this new world, DevOps is finding it needs a broader range of functionality to truly understand system performance and potential issues. That functionality includes:

■ Collection of high frequency, high cardinality metrics across all containers, applications, and microservices. This data is typically stored over a long time to enable trending, yet is becoming more complex in today’s systems

■ Correlation of metrics with events (like a Kubernetes scaling event or a code push)

■ Capture of deep troubleshooting information like logs or system calls to derive a root cause issue in both the application and/or the infrastructure

■ Tracing key transactions through the call stack

A New Breed of Monitoring

With this broad range of requirements, it is easy to see that one system is unlikely to serve all of these needs well. And that has led to wider adoption of a new breed of cloud-native IT infrastructure monitoring (ITIM), a device- or capability-oriented approach that focuses on drawing a link between your applications, microservices, and the underlying infrastructure.

According to Gregg Siegfried from Gartner, "IT Infrastructure monitoring has always been difficult to do well. Cloud platforms, containers and changing software architecture have only increased the challenges." (Gartner, "Monitoring Modern Services and Infrastructure" by Gregg Siegfried on 15 March 2018)

Cloud-native systems have radically increased the need for dynamic metric systems. In addition, organizations that need high-volume, high cardinality metrics (think Facebook or Netflix) used to be the exception, but they are now becoming commonplace across enterprises of all sizes. APM by itself can't meet the needs of these new systems.

As a result, organizations are adopting APM and ITIM alongside each other. Critical management criteria align with different monitoring tools. Performance metrics are associated with ITIM; tracing is aligned with APM; logging is part of incident and event management. While there is some overlap, if we look at their core functionality there is far more differentiation than repetition.

APM typically works with heavyweight instrumentation inside your application code, giving you a detailed look at how the code written by your developers is performing. That’s extremely valuable, especially when developers are debugging their code in test before it goes into production. Unfortunately, APM also abstracts away the underlying containers, hosts, and network infrastructure. That's not an issue for developers since they only need to worry about the code they wrote, but operations professionals must consider the entire stack, and have something resource-efficient enough to actually deploy across everything in production.

In contrast, a modern, cloud-native ITIM monitoring system doesn’t instrument your code. But it will give you system visibility by instrumenting all the hosts in your environment and give you visibility into networks (physical and software-defined), as well as hosts, containers, processes, base application metrics, and developer-provided custom metrics like Prometheus, statsd and JMX.

Scale is also a very different challenge for any implementation using ITIM. APM was not designed for high frequency, high cardinality, multi-dimensional metrics, but modern ITIM was conceived with massive scale and a need to recompute metrics on the fly based on high cardinality metadata. Your ITIM tool should enable you to store all the metrics in a raw form, and recompute the answers to questions on the fly - an essential.

With this rich functionality, cloud-native ITIM monitoring systems give you a powerful view of overall system performance, especially where your applications are interacting with underlying systems.

But again, for most organizations this isn't an either-or situation. You might eliminate your APM tool if you have absolute faith nothing will ever go wrong with your application code. Or if you're extremely confident your infrastructure, container, and orchestration tooling will always perform as expected. But most DevOps professionals will see through this false dichotomy and use some combination of these tools to ensure performance, reliability and security. And if your organization is focused on the fastest mean time to resolution (MTTR) as a performance metric, it's best to have both systems in place.

Apurva Davé is VP of Marketing at Sysdig
Share this

The Latest

May 21, 2019

Findings of the Digital Employee Experience survey from VMware show correlation between enabling employees with a positive digital experience (i.e., device choice/flexibility, seamless access to apps, remote work capabilities) and an organization's competitive position, revenue growth and employee sentiment ...

May 20, 2019

In today's competitive landscape, businesses must have the ability and process in place to face new challenges and find ways to successfully tackle them in a proactive manner. For years, this has been placed on the shoulders of DevOps teams within IT departments. But, as automation takes over manual intervention to increase speed and efficiency, these teams are facing what we know as IT digitization. How has this changed the way companies function over the years, and what do we have to look forward to in the coming years? ...

May 16, 2019

Although the vast majority of IT organizations have implemented a broad variety of systems and tools to modernize, simplify and streamline data center operations, many are still burdened by inefficiencies, security risks and performance gaps in their IT infrastructure as well as the excessive time it takes to manage legacy infrastructure, according to the State of IT Transformation, a report from Datrium ...

May 15, 2019

When it comes to network visibility, there are a lot of discussions about packet broker technology and the various features these solutions provide to network architects and IT managers. Packet brokers allow organizations to aggregate the data required for a variety of monitoring solutions including network performance monitoring and diagnostic (NPMD) platforms and unified threat management (UTM) appliances. But, when it comes to ensuring these solutions provide the insights required by NetOps and security teams, IT can spend an exorbitant amount of time dealing with issues around adds, moves and changes. This can have a dramatic impact on budgets and tool availability. Why does this happen? ...

May 14, 2019

Data may be pouring into enterprises but IT professionals still find most of it stuck in siloed departments and weeks away from being able to drive any valued action. Coupled with the ongoing concerns over security responsiveness, IT teams have to push aside other important performance-oriented data in order to ensure security data, at least, gets prominent attention. A new survey by Ivanti shows the disconnect between enterprise departments struggling to improve operations like automation while being challenged with a siloed structure and a data onslaught ...

May 13, 2019

A subtle, deliberate shift has occurred within the software industry which, at present, only the most innovative organizations have seized upon for competitive advantage. Although primarily driven by Artificial Intelligence (AI), this transformation strikes at the core of the most pervasive IT resources including cloud computing and predictive analytics ...

May 09, 2019

When asked who is mandated with developing and delivering their organization's digital competencies, 51% of respondents say their IT departments have a leadership role. The critical question is whether IT departments are prepared to take on a leadership role in which collaborating with other functions and disseminating knowledge and digital performance data are requirements ...

May 08, 2019

The Economist Intelligence Unit just released a new study commissioned by Riverbed that explores nine digital competencies that help organizations improve their digital performance and, ultimately, achieve their objectives. Here's a brief summary of 7 key research findings you'll find covered in detail in the report ...

May 07, 2019

Today, the overall customer scenario has digitally transformed and practically there is no limitation to the ways in which the target customers can be reached. These opportunities are throwing multiple challenges for brands and enterprises, and one of the prominent ones is to ensure Omni Channel experience for customers ...

May 06, 2019

Most businesses (92 percent of respondents) see the potential value of data and 36 percent are already monetizing their data, according to the Global Data Protection Index from Dell EMC. While this acknowledgement is positive, however, most respondents are struggling to properly protect their data ...