Skipping Application Monitoring is the Biggest Anti-Pattern in Application Observability
July 19, 2021

Chris Farrell

Share this

Anti-patterns involve realizing a problem and implementing a non-optimal solution that is broadly embraced as the go-to method for solving that problem. This solution sounds good in theory, but for one reason or another it is not the best means of solving the problem.

A common example of this involves gasoline and rising prices. As prices go up, consumers tend to avoid getting gas as long as possible, until they are running on fumes. In reality, the best way to save money during this time would be to fill up your tank every chance you get.

Anti-patterns are common across IT as well, especially around application monitoring and observability. One that is particularly prevalent is in response to the increasing complexity of cloud-native infrastructure and applications. The [suboptimal] idea is that the best way to monitor modern applications is to not install monitoring, but rather have developers manually code in their own monitoring capabilities, put all the data into logs, and solve problems by analyzing custom dashboards and the resulting log files.

The reality is that this concept tends to lead to a multitude of visibility gaps, and can even send SWAT teams down the wrong path, depending on what's instrumented, collected and shown. The worst case would be application slow-downs, or even outages, occurring — all while the dashboards show "all systems green."

The problem with anti-patterns is that a popular idea can gain ground, even if the solution is suboptimal. For the afore-mentioned gasoline issue, it might take some math on a napkin to show how a different process can save money. For IT monitoring strategies, it might take a little bit more. To understand when a specific solution or process is an anti-pattern — and how to solve the problem in a more optimal way, it's important to recognize what led to the situation, the ultimate goal, and then open up to different solutions.

What Caused the Application Monitoring Anti-Pattern?

In the case of cloud-native application performance, the problem is that legacy application monitoring tools, which require continuous configuration and even some manual coding to reach their full value proposition, can lead to slow-downs in the DevOps and continuous integration / continuous deployment (CI/CD) process by requiring reconfiguration every time an update is released. There's always a chance that if the new reconfiguration isn't done (and done right), that the tool will not have the right data to either recognize a problem or solve it.

This is what has led many to eschew the idea of a monitoring tool and, instead, have their developers instrument monitoring into the code and simply analyze everything in logs themselves. Ultimately, they recognize the time consuming and menial work log analysis is, but it's seen as the lesser of two evils when compared to constant reconfiguration of monitoring.

But this isn't exactly optimal, itself. If the developers don't capture the right information at the right time, then the log analysis strategy is just as iffy as an unconfigured APM tool. Meanwhile, the only way to understand how any two pieces fit together is to bring the entire team into the analysis phase, which probably means even bigger bridge calls than with just the APM swat team approach.

Finding A Better Solution

As with any anti-pattern, including our real-world example above, the way to find an optimal solution is to start with the goal and make sure you're working towards that goal. In the gasoline example, people generally equate less frequent purchases as spending less, but if they instead focus on the actual cost itself, they can recognize an alternative that better achieves their goal of minimizing costs.

The same is true in application monitoring. The goal is to get the most immediate feedback on any software update, to proactively understand when a problem is occurring and easily, and quickly, solve the problem.

IT teams know that they want:

■ Monitoring up and down the cloud-native stack

■ Understanding within monitoring when changes occur

■ Access to data (and understanding) from a broader set of stakeholders

Certainly, the idea of developers coding, monitoring, and tracing, coupled with direct log analysis by every stakeholder, meets the above — but does it truly achieve the ultimate goals of Dev+Ops when it comes to operating their applications?

Let's tackle the problems and misconceptions of this observability anti-pattern:

Configuring monitoring is hard — no one wants to spend the time or investment needed to even get going with a monitoring tool.

We agree, it can be hard. But there are monitoring and observability solutions that automate the hard part (we promise, they exist). You shouldn't avoid the idea of monitoring because of the traditional hurdles involved in setting this up.

We can provide data for everyone to use! No observability tool needed. What does providing a firehose of all data to all users create? A lot of time wasting, inefficiency, and non-focused analysis.

The problem here is: If you provide all the data to a user, it will take forever to sort through what is relevant to them. Or, if you provide only the specific data related to the application they care about for example, they won't have the context needed to fully understand the situation.

What if an issue isn't the application itself, but a specific user?

What if there were previous outages for this application?

Monitoring solutions, after being implemented, can provide data with accurate context, automatically, so you can view your applications in the scope of everything else going on.

How can a monitoring / observability solution enable intelligent decision-making? How do we make it so the right people get the right data and make the best decisions they can?

These are the questions to be asking and the real challenges to solve for. A modern monitoring solution can help answer these questions when they offer:
- Real-time automation
- Automation of configuration
- Data within context
- A machine learning engine that improves and delivers data to all other AIOps platforms too

Legacy monitoring solutions have led organizations astray, thinking they can save time, effort, and cost by not implementing APM into cloud-native architectures. But modern monitoring solutions were designed for these modern environments and are the actual best way in which organizations can save time, effort and money, while empowering the entire IT team.

Chris Farrell is Observability and APM Strategist at Instana
Share this

The Latest

December 08, 2022

Industry experts offer thoughtful, insightful, and often controversial predictions on how APM, AIOps, Observability, OpenTelemetry and related technologies will evolve and impact business in 2023. Part 4 covers monitoring, site reliability engineering and ITSM ...

December 07, 2022

Industry experts offer thoughtful, insightful, and often controversial predictions on how APM, AIOps, Observability, OpenTelemetry and related technologies will evolve and impact business in 2023. Part 3 covers OpenTelemetry ...

December 06, 2022

Industry experts offer thoughtful, insightful, and often controversial predictions on how APM, AIOps, Observability, OpenTelemetry and related technologies will evolve and impact business in 2023. Part 2 covers more on observability ...

December 05, 2022

The Holiday Season means it is time for APMdigest's annual list of Application Performance Management (APM) predictions, covering IT performance topics. Industry experts — from analysts and consultants to the top vendors — offer thoughtful, insightful, and often controversial predictions on how APM, observability, AIOps and related technologies will evolve and impact business in 2023. Part 1 covers APM and Observability ...

December 01, 2022

You could argue that, until the pandemic, and the resulting shift to hybrid working, delivering flawless customer experiences and improving employee productivity were mutually exclusive activities. Evidence from Catchpoint's recently published Site Reliability Engineering (SRE) industry report suggests this is changing ...

November 30, 2022

There are many issues that can contribute to developer dissatisfaction on the job — inadequate pay and work-life imbalance, for example. But increasingly there's also a troubling and growing sense of lacking ownership and feeling out of control ... One key way to increase job satisfaction is to ameliorate this sense of ownership and control whenever possible, and approaches to observability offer several ways to do this ...

November 29, 2022

The need for real-time, reliable data is increasing, and that data is a necessity to remain competitive in today's business landscape. At the same time, observability has become even more critical with the complexity of a hybrid multi-cloud environment. To add to the challenges and complexity, the term "observability" has not been clearly defined ...

November 28, 2022

Many have assumed that the mainframe is a dying entity, but instead, a mainframe renaissance is underway. Despite this notion, we are ushering in a future of more strategic investments, increased capacity, and leading innovations ...

November 22, 2022

Most (85%) consumers shop online or via a mobile app, with 59% using these digital channels as their primary holiday shopping channel, according to the Black Friday Consumer Report from Perforce Software. As brands head into a highly profitable time of year, starting with Black Friday and Cyber Monday, it's imperative development teams prepare for peak traffic, optimal channel performance, and seamless user experiences to retain and attract shoppers ...

November 21, 2022

From staffing issues to ineffective cloud strategies, NetOps teams are looking at how to streamline processes, consolidate tools, and improve network monitoring. What are some best practices that can help achieve this? Let's dive into five ...