Skipping Application Monitoring is the Biggest Anti-Pattern in Application Observability
July 19, 2021

Chris Farrell
Instana

Share this

Anti-patterns involve realizing a problem and implementing a non-optimal solution that is broadly embraced as the go-to method for solving that problem. This solution sounds good in theory, but for one reason or another it is not the best means of solving the problem.

A common example of this involves gasoline and rising prices. As prices go up, consumers tend to avoid getting gas as long as possible, until they are running on fumes. In reality, the best way to save money during this time would be to fill up your tank every chance you get.

Anti-patterns are common across IT as well, especially around application monitoring and observability. One that is particularly prevalent is in response to the increasing complexity of cloud-native infrastructure and applications. The [suboptimal] idea is that the best way to monitor modern applications is to not install monitoring, but rather have developers manually code in their own monitoring capabilities, put all the data into logs, and solve problems by analyzing custom dashboards and the resulting log files.

The reality is that this concept tends to lead to a multitude of visibility gaps, and can even send SWAT teams down the wrong path, depending on what's instrumented, collected and shown. The worst case would be application slow-downs, or even outages, occurring — all while the dashboards show "all systems green."

The problem with anti-patterns is that a popular idea can gain ground, even if the solution is suboptimal. For the afore-mentioned gasoline issue, it might take some math on a napkin to show how a different process can save money. For IT monitoring strategies, it might take a little bit more. To understand when a specific solution or process is an anti-pattern — and how to solve the problem in a more optimal way, it's important to recognize what led to the situation, the ultimate goal, and then open up to different solutions.

What Caused the Application Monitoring Anti-Pattern?

In the case of cloud-native application performance, the problem is that legacy application monitoring tools, which require continuous configuration and even some manual coding to reach their full value proposition, can lead to slow-downs in the DevOps and continuous integration / continuous deployment (CI/CD) process by requiring reconfiguration every time an update is released. There's always a chance that if the new reconfiguration isn't done (and done right), that the tool will not have the right data to either recognize a problem or solve it.

This is what has led many to eschew the idea of a monitoring tool and, instead, have their developers instrument monitoring into the code and simply analyze everything in logs themselves. Ultimately, they recognize the time consuming and menial work log analysis is, but it's seen as the lesser of two evils when compared to constant reconfiguration of monitoring.

But this isn't exactly optimal, itself. If the developers don't capture the right information at the right time, then the log analysis strategy is just as iffy as an unconfigured APM tool. Meanwhile, the only way to understand how any two pieces fit together is to bring the entire team into the analysis phase, which probably means even bigger bridge calls than with just the APM swat team approach.

Finding A Better Solution

As with any anti-pattern, including our real-world example above, the way to find an optimal solution is to start with the goal and make sure you're working towards that goal. In the gasoline example, people generally equate less frequent purchases as spending less, but if they instead focus on the actual cost itself, they can recognize an alternative that better achieves their goal of minimizing costs.

The same is true in application monitoring. The goal is to get the most immediate feedback on any software update, to proactively understand when a problem is occurring and easily, and quickly, solve the problem.

IT teams know that they want:

■ Monitoring up and down the cloud-native stack

■ Understanding within monitoring when changes occur

■ Access to data (and understanding) from a broader set of stakeholders

Certainly, the idea of developers coding, monitoring, and tracing, coupled with direct log analysis by every stakeholder, meets the above — but does it truly achieve the ultimate goals of Dev+Ops when it comes to operating their applications?

Let's tackle the problems and misconceptions of this observability anti-pattern:

Configuring monitoring is hard — no one wants to spend the time or investment needed to even get going with a monitoring tool.

We agree, it can be hard. But there are monitoring and observability solutions that automate the hard part (we promise, they exist). You shouldn't avoid the idea of monitoring because of the traditional hurdles involved in setting this up.

We can provide data for everyone to use! No observability tool needed. What does providing a firehose of all data to all users create? A lot of time wasting, inefficiency, and non-focused analysis.

The problem here is: If you provide all the data to a user, it will take forever to sort through what is relevant to them. Or, if you provide only the specific data related to the application they care about for example, they won't have the context needed to fully understand the situation.

What if an issue isn't the application itself, but a specific user?

What if there were previous outages for this application?

Monitoring solutions, after being implemented, can provide data with accurate context, automatically, so you can view your applications in the scope of everything else going on.

How can a monitoring / observability solution enable intelligent decision-making? How do we make it so the right people get the right data and make the best decisions they can?

These are the questions to be asking and the real challenges to solve for. A modern monitoring solution can help answer these questions when they offer:
- Real-time automation
- Automation of configuration
- Data within context
- A machine learning engine that improves and delivers data to all other AIOps platforms too

Legacy monitoring solutions have led organizations astray, thinking they can save time, effort, and cost by not implementing APM into cloud-native architectures. But modern monitoring solutions were designed for these modern environments and are the actual best way in which organizations can save time, effort and money, while empowering the entire IT team.

Chris Farrell is Observability and APM Strategist at Instana
Share this

The Latest

June 29, 2022

When it comes to AIOps predictions, there's no question of AI's value in predictive intelligence and faster problem resolution for IT teams. In fact, Gartner has reported that there is no future for IT Operations without AIOps. So, where is AIOps headed in five years? Here's what the vendors and thought leaders in the AIOps space had to share ...

June 27, 2022

A new study by OpsRamp on the state of the Managed Service Providers (MSP) market concludes that MSPs face a market of bountiful opportunities but must prepare for this growth by embracing complex technologies like hybrid cloud management, root cause analysis and automation ...

June 27, 2022

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research. In a new survey almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management ...

June 23, 2022

Digital businesses don't invest in monitoring for monitoring's sake. They do it to make the business run better. Every dollar spent on observability — every hour your team spends using monitoring tools or responding to what they reveal — should tie back directly to business outcomes: conversions, revenues, brand equity. If they don't? You might be missing the forest for the trees ...

June 22, 2022

Every day, companies are missing customer experience (CX) "red flags" because they don't have the tools to observe CX processes or metrics. Even basic errors or defects in automated customer interactions are left undetected for days, weeks or months, leading to widespread customer dissatisfaction. In fact, poor CX and digital technology investments are costing enterprises billions of dollars in lost potential revenue ...

June 21, 2022

Organizations are moving to microservices and cloud native architectures at an increasing pace. The primary incentive for these transformation projects is typically to increase the agility and velocity of software release and product innovation. These dynamic systems, however, are far more complex to manage and monitor, and they generate far higher data volumes ...

June 16, 2022

Global IT teams adapted to remote work in 2021, resolving employee tickets 23% faster than the year before as overall resolution time for IT tickets went down by 7 hours, according to the Freshservice Service Management Benchmark Report from Freshworks ...

June 15, 2022

Once upon a time data lived in the data center. Now data lives everywhere. All this signals the need for a new approach to data management, a next-gen solution ...

June 14, 2022

Findings from the 2022 State of Edge Messaging Report from Ably and Coleman Parkes Research show that most organizations (65%) that have built edge messaging capabilities in house have experienced an outage or significant downtime in the last 12-18 months. Most of the current in-house real-time messaging services aren't cutting it ...

June 13, 2022
Today's users want a complete digital experience when dealing with a software product or system. They are not content with the page load speeds or features alone but want the software to perform optimally in an omnichannel environment comprising multiple platforms, browsers, devices, and networks. This calls into question the role of load testing services to check whether the given software under testing can perform optimally when subjected to peak load ...