APM vs Monitoring in Cloud-Native Environments: Reject the False Dichotomy
August 24, 2018

Apurva Davé
Sysdig

Share this

Ask anyone who's managed software in production: Management tools have many useful attributes, but no single tool gives you everything you need. Oh sure, a new interface comes along and handles an emerging use case beautifully – for a while. But requirements inevitably change and new variables get added to the equation. You add, upgrade or increase the complexity as needed.

This is a familiar arc for developers, IT pros and anyone who manages applications and their underlying infrastructure. And the story is no different when you look at observability tools like application performance management (APM).

For DevOps professionals, the advent of cloud-native systems and X-as-a-service has exposed the limitations of traditional APM tools. Most APM tools were designed to instrument and visualize simpler, static monoliths, and focused on the application layer to visualize traces of individual transactions. The fact is, APM is still sorely needed for developers, but it is not a panacea when it comes to understanding the overall performance of your application.

With cloud native computing, you may have dozens of microservices and hundreds or thousands of short-lived containers spread across multiple clouds. The efficiency of microservices is great for developer agility, but microservice architectures have also complicated the job of the operations team to ensure the performance, uptime and security of their systems.

In this new world, DevOps is finding it needs a broader range of functionality to truly understand system performance and potential issues. That functionality includes:

■ Collection of high frequency, high cardinality metrics across all containers, applications, and microservices. This data is typically stored over a long time to enable trending, yet is becoming more complex in today’s systems

■ Correlation of metrics with events (like a Kubernetes scaling event or a code push)

■ Capture of deep troubleshooting information like logs or system calls to derive a root cause issue in both the application and/or the infrastructure

■ Tracing key transactions through the call stack

A New Breed of Monitoring

With this broad range of requirements, it is easy to see that one system is unlikely to serve all of these needs well. And that has led to wider adoption of a new breed of cloud-native IT infrastructure monitoring (ITIM), a device- or capability-oriented approach that focuses on drawing a link between your applications, microservices, and the underlying infrastructure.

According to Gregg Siegfried from Gartner, "IT Infrastructure monitoring has always been difficult to do well. Cloud platforms, containers and changing software architecture have only increased the challenges." (Gartner, "Monitoring Modern Services and Infrastructure" by Gregg Siegfried on 15 March 2018)

Cloud-native systems have radically increased the need for dynamic metric systems. In addition, organizations that need high-volume, high cardinality metrics (think Facebook or Netflix) used to be the exception, but they are now becoming commonplace across enterprises of all sizes. APM by itself can't meet the needs of these new systems.

As a result, organizations are adopting APM and ITIM alongside each other. Critical management criteria align with different monitoring tools. Performance metrics are associated with ITIM; tracing is aligned with APM; logging is part of incident and event management. While there is some overlap, if we look at their core functionality there is far more differentiation than repetition.

APM typically works with heavyweight instrumentation inside your application code, giving you a detailed look at how the code written by your developers is performing. That’s extremely valuable, especially when developers are debugging their code in test before it goes into production. Unfortunately, APM also abstracts away the underlying containers, hosts, and network infrastructure. That's not an issue for developers since they only need to worry about the code they wrote, but operations professionals must consider the entire stack, and have something resource-efficient enough to actually deploy across everything in production.

In contrast, a modern, cloud-native ITIM monitoring system doesn’t instrument your code. But it will give you system visibility by instrumenting all the hosts in your environment and give you visibility into networks (physical and software-defined), as well as hosts, containers, processes, base application metrics, and developer-provided custom metrics like Prometheus, statsd and JMX.

Scale is also a very different challenge for any implementation using ITIM. APM was not designed for high frequency, high cardinality, multi-dimensional metrics, but modern ITIM was conceived with massive scale and a need to recompute metrics on the fly based on high cardinality metadata. Your ITIM tool should enable you to store all the metrics in a raw form, and recompute the answers to questions on the fly - an essential.

With this rich functionality, cloud-native ITIM monitoring systems give you a powerful view of overall system performance, especially where your applications are interacting with underlying systems.

But again, for most organizations this isn't an either-or situation. You might eliminate your APM tool if you have absolute faith nothing will ever go wrong with your application code. Or if you're extremely confident your infrastructure, container, and orchestration tooling will always perform as expected. But most DevOps professionals will see through this false dichotomy and use some combination of these tools to ensure performance, reliability and security. And if your organization is focused on the fastest mean time to resolution (MTTR) as a performance metric, it's best to have both systems in place.

Apurva Davé is VP of Marketing at Sysdig
Share this

The Latest

August 19, 2019

One common infrastructure challenge arises with virtual private networks (VPNs). VPNs have long been relied upon to deliver the network connectivity and security enterprises required at a price they could afford. Organizations still routinely turn to them to provide internal and trusted third-parties with "secure" remote access to isolated networks. However, with the rise in mobile, IoT, multi- and hybrid-cloud, as well as edge computing, traditional enterprise perimeters are extending and becoming blurred ...

August 15, 2019

The configuration management database (CMDB), along with its more federated companion, the configuration management system (CMS), has been bathed in a deluge of negative opinions from all fronts — industry experts, vendors, and IT professionals. But from what recent EMA research on analytics, ITSM performance and other areas is indicating, those negative views seem to be missing out on a real undercurrent of truth — that CMDB/CMS alignments, whatever their defects, strongly skew to success in terms of overall IT progressiveness and effectiveness ...

August 14, 2019

The on-demand economy has transformed the way we move around, eat, learn, travel and connect at a massive scale. However, with disruption and big aspirations comes big, complex challenges. To take these challenges head-on, on-demand economy companies are finding new ways to deliver their services and products to an audience with ever-increasing expectations, and that's what we'll look at in this blog ...

August 13, 2019

To thrive in today's highly competitive digital business landscape, organizations must harness their "digital DNA." In other words, they need to connect all of their systems and databases — including various business applications, devices, big data and any instances of IoT and hybrid cloud environments — so they're accessible and actionable. By integrating all existing components and new technologies, organizations can gain a comprehensive, trusted view of their business functions, thereby enabling more agile deployment processes and ensuring scalable growth and relevance over the long-term ...

August 12, 2019

Advancements in technology innovation are happening so quickly, the decision of where and when to transform can be a moving target for businesses. When done well, digital transformation improves the customer experience while optimizing operational efficiency. To get there, enterprises must encourage experimentation to overcome organizational obstacles. In other words ...

August 08, 2019

IoT adoption is growing rapidly, and respondents believe 30% of their company’s revenue two years from now will be due to IoT, according to the new IoT Signals report from Microsoft Corp ...

August 07, 2019

It's been all over the news the last few months. After two fatal crashes, Boeing was forced to ground its 737. The doomed model is now undergoing extensive testing to get it back into service and production. Large organizations often tell stakeholders that even though all software goes through extensive testing, this type of thing “just happens.” But that is exactly the problem. While the human component of application development and testing won't go away, it can be eased and supplemented by far more efficient and automated methods to proactively determine software health and identify flaws ...

August 06, 2019

Despite significant investment in AI, many companies are still struggling to stabilize and scale their AI initiatives, according to State of Development and Operations of AI Applications 2019 from Dotscience ...

August 05, 2019

IT has two principal functions: create a network that satisfies the business needs of the company and then optimize that network. Unfortunately, the modern enterprise is buried under a myriad of network improvement projects that often do not get deployed ...

August 01, 2019

Even large companies are not yet realizing the potential of digital transformation, according to a new study from Cherwell Software, The Power of Process Integration in the Information Age ...