APM vs Monitoring in Cloud-Native Environments: Reject the False Dichotomy
August 24, 2018

Apurva Davé
Sysdig

Share this

Ask anyone who's managed software in production: Management tools have many useful attributes, but no single tool gives you everything you need. Oh sure, a new interface comes along and handles an emerging use case beautifully – for a while. But requirements inevitably change and new variables get added to the equation. You add, upgrade or increase the complexity as needed.

This is a familiar arc for developers, IT pros and anyone who manages applications and their underlying infrastructure. And the story is no different when you look at observability tools like application performance management (APM).

For DevOps professionals, the advent of cloud-native systems and X-as-a-service has exposed the limitations of traditional APM tools. Most APM tools were designed to instrument and visualize simpler, static monoliths, and focused on the application layer to visualize traces of individual transactions. The fact is, APM is still sorely needed for developers, but it is not a panacea when it comes to understanding the overall performance of your application.

With cloud native computing, you may have dozens of microservices and hundreds or thousands of short-lived containers spread across multiple clouds. The efficiency of microservices is great for developer agility, but microservice architectures have also complicated the job of the operations team to ensure the performance, uptime and security of their systems.

In this new world, DevOps is finding it needs a broader range of functionality to truly understand system performance and potential issues. That functionality includes:

■ Collection of high frequency, high cardinality metrics across all containers, applications, and microservices. This data is typically stored over a long time to enable trending, yet is becoming more complex in today’s systems

■ Correlation of metrics with events (like a Kubernetes scaling event or a code push)

■ Capture of deep troubleshooting information like logs or system calls to derive a root cause issue in both the application and/or the infrastructure

■ Tracing key transactions through the call stack

A New Breed of Monitoring

With this broad range of requirements, it is easy to see that one system is unlikely to serve all of these needs well. And that has led to wider adoption of a new breed of cloud-native IT infrastructure monitoring (ITIM), a device- or capability-oriented approach that focuses on drawing a link between your applications, microservices, and the underlying infrastructure.

According to Gregg Siegfried from Gartner, "IT Infrastructure monitoring has always been difficult to do well. Cloud platforms, containers and changing software architecture have only increased the challenges." (Gartner, "Monitoring Modern Services and Infrastructure" by Gregg Siegfried on 15 March 2018)

Cloud-native systems have radically increased the need for dynamic metric systems. In addition, organizations that need high-volume, high cardinality metrics (think Facebook or Netflix) used to be the exception, but they are now becoming commonplace across enterprises of all sizes. APM by itself can't meet the needs of these new systems.

As a result, organizations are adopting APM and ITIM alongside each other. Critical management criteria align with different monitoring tools. Performance metrics are associated with ITIM; tracing is aligned with APM; logging is part of incident and event management. While there is some overlap, if we look at their core functionality there is far more differentiation than repetition.

APM typically works with heavyweight instrumentation inside your application code, giving you a detailed look at how the code written by your developers is performing. That’s extremely valuable, especially when developers are debugging their code in test before it goes into production. Unfortunately, APM also abstracts away the underlying containers, hosts, and network infrastructure. That's not an issue for developers since they only need to worry about the code they wrote, but operations professionals must consider the entire stack, and have something resource-efficient enough to actually deploy across everything in production.

In contrast, a modern, cloud-native ITIM monitoring system doesn’t instrument your code. But it will give you system visibility by instrumenting all the hosts in your environment and give you visibility into networks (physical and software-defined), as well as hosts, containers, processes, base application metrics, and developer-provided custom metrics like Prometheus, statsd and JMX.

Scale is also a very different challenge for any implementation using ITIM. APM was not designed for high frequency, high cardinality, multi-dimensional metrics, but modern ITIM was conceived with massive scale and a need to recompute metrics on the fly based on high cardinality metadata. Your ITIM tool should enable you to store all the metrics in a raw form, and recompute the answers to questions on the fly - an essential.

With this rich functionality, cloud-native ITIM monitoring systems give you a powerful view of overall system performance, especially where your applications are interacting with underlying systems.

But again, for most organizations this isn't an either-or situation. You might eliminate your APM tool if you have absolute faith nothing will ever go wrong with your application code. Or if you're extremely confident your infrastructure, container, and orchestration tooling will always perform as expected. But most DevOps professionals will see through this false dichotomy and use some combination of these tools to ensure performance, reliability and security. And if your organization is focused on the fastest mean time to resolution (MTTR) as a performance metric, it's best to have both systems in place.

Apurva Davé is VP of Marketing at Sysdig
Share this

The Latest

December 07, 2018

As more organizations embrace digital business, infrastructure and operations (I&O) leaders will need to evolve their strategies and skills to provide an agile infrastructure for their business. In fact, Gartner, Inc. said that 75 percent of I&O leaders are not prepared with the skills, behaviors or cultural presence needed over the next two to three years ...

December 06, 2018

Today there is an urgent need for Agents of Transformation, a new breed of technologist, primed to drive innovation and enable companies to thrive in the face of rapid technological advancement, according to The Agents of Transformation Report from AppDynamics, a Cisco company ...

December 05, 2018

One in four Global Fortune 2000 enterprises rank Internet of Things (IoT) deployment as the most important initiative in their organization, yet 90% experience barriers to effective implementation and expansion due to lack of IoT expertise and skills in-house, according to a new independent survey from VansonBourne ...

December 04, 2018

Nearly three-quarters (74%) of IT leaders are concerned that Internet of Things (IoT) performance problems could directly impact business operations and significantly damage revenues, according to a new report, entitled Overcoming the Complexity of Web-Scale IoT Applications: The Top 5 Challenges ...

December 03, 2018

Gartner highlighted the top strategic Internet of Things (IoT) technology trends that will drive digital business innovation from 2018 through 2023 ...

November 30, 2018

While 95 percent of organizations have a disaster recovery plan in place, 23 percent never test their plan, according to a new survey from Spiceworks ...

November 29, 2018

Cloud computing has clear and definite benefits. At the same time there is an excessive amount of vendor hype that it will fix a lot of problems which it will not. It can also create new problems with a lack of visibility and many IT professionals are disappointed with their leap to a pure cloud environment ...

November 28, 2018

In the world of digital experience monitoring (DEM) — where the end user experience is paramount — cloud-based nodes, along with a variety of other node types, are used to build a view of the end user's digital experience. But major companies are now depending solely on cloud nodes for DEM. Research from Catchpoint, in addition to real-world customer data, shows this is a mistake ...

November 27, 2018

Achieving success with cloud adoption remains an elusive challenge for many organizations. But why is that the case? After all, there are countless tools designed to facilitate the process of taking on-premises operations to the cloud ...

November 26, 2018

The results from the latest State of Testing Survey by SmartBear show that 80 percent of respondents are testing some kind of API or web service ...