Skipping Application Monitoring is the Biggest Anti-Pattern in Application Observability
July 19, 2021

Chris Farrell
Instana

Share this

Anti-patterns involve realizing a problem and implementing a non-optimal solution that is broadly embraced as the go-to method for solving that problem. This solution sounds good in theory, but for one reason or another it is not the best means of solving the problem.

A common example of this involves gasoline and rising prices. As prices go up, consumers tend to avoid getting gas as long as possible, until they are running on fumes. In reality, the best way to save money during this time would be to fill up your tank every chance you get.

Anti-patterns are common across IT as well, especially around application monitoring and observability. One that is particularly prevalent is in response to the increasing complexity of cloud-native infrastructure and applications. The [suboptimal] idea is that the best way to monitor modern applications is to not install monitoring, but rather have developers manually code in their own monitoring capabilities, put all the data into logs, and solve problems by analyzing custom dashboards and the resulting log files.

The reality is that this concept tends to lead to a multitude of visibility gaps, and can even send SWAT teams down the wrong path, depending on what's instrumented, collected and shown. The worst case would be application slow-downs, or even outages, occurring — all while the dashboards show "all systems green."

The problem with anti-patterns is that a popular idea can gain ground, even if the solution is suboptimal. For the afore-mentioned gasoline issue, it might take some math on a napkin to show how a different process can save money. For IT monitoring strategies, it might take a little bit more. To understand when a specific solution or process is an anti-pattern — and how to solve the problem in a more optimal way, it's important to recognize what led to the situation, the ultimate goal, and then open up to different solutions.

What Caused the Application Monitoring Anti-Pattern?

In the case of cloud-native application performance, the problem is that legacy application monitoring tools, which require continuous configuration and even some manual coding to reach their full value proposition, can lead to slow-downs in the DevOps and continuous integration / continuous deployment (CI/CD) process by requiring reconfiguration every time an update is released. There's always a chance that if the new reconfiguration isn't done (and done right), that the tool will not have the right data to either recognize a problem or solve it.

This is what has led many to eschew the idea of a monitoring tool and, instead, have their developers instrument monitoring into the code and simply analyze everything in logs themselves. Ultimately, they recognize the time consuming and menial work log analysis is, but it's seen as the lesser of two evils when compared to constant reconfiguration of monitoring.

But this isn't exactly optimal, itself. If the developers don't capture the right information at the right time, then the log analysis strategy is just as iffy as an unconfigured APM tool. Meanwhile, the only way to understand how any two pieces fit together is to bring the entire team into the analysis phase, which probably means even bigger bridge calls than with just the APM swat team approach.

Finding A Better Solution

As with any anti-pattern, including our real-world example above, the way to find an optimal solution is to start with the goal and make sure you're working towards that goal. In the gasoline example, people generally equate less frequent purchases as spending less, but if they instead focus on the actual cost itself, they can recognize an alternative that better achieves their goal of minimizing costs.

The same is true in application monitoring. The goal is to get the most immediate feedback on any software update, to proactively understand when a problem is occurring and easily, and quickly, solve the problem.

IT teams know that they want:

■ Monitoring up and down the cloud-native stack

■ Understanding within monitoring when changes occur

■ Access to data (and understanding) from a broader set of stakeholders

Certainly, the idea of developers coding, monitoring, and tracing, coupled with direct log analysis by every stakeholder, meets the above — but does it truly achieve the ultimate goals of Dev+Ops when it comes to operating their applications?

Let's tackle the problems and misconceptions of this observability anti-pattern:

Configuring monitoring is hard — no one wants to spend the time or investment needed to even get going with a monitoring tool.

We agree, it can be hard. But there are monitoring and observability solutions that automate the hard part (we promise, they exist). You shouldn't avoid the idea of monitoring because of the traditional hurdles involved in setting this up.

We can provide data for everyone to use! No observability tool needed. What does providing a firehose of all data to all users create? A lot of time wasting, inefficiency, and non-focused analysis.

The problem here is: If you provide all the data to a user, it will take forever to sort through what is relevant to them. Or, if you provide only the specific data related to the application they care about for example, they won't have the context needed to fully understand the situation.

What if an issue isn't the application itself, but a specific user?

What if there were previous outages for this application?

Monitoring solutions, after being implemented, can provide data with accurate context, automatically, so you can view your applications in the scope of everything else going on.

How can a monitoring / observability solution enable intelligent decision-making? How do we make it so the right people get the right data and make the best decisions they can?

These are the questions to be asking and the real challenges to solve for. A modern monitoring solution can help answer these questions when they offer:
- Real-time automation
- Automation of configuration
- Data within context
- A machine learning engine that improves and delivers data to all other AIOps platforms too

Legacy monitoring solutions have led organizations astray, thinking they can save time, effort, and cost by not implementing APM into cloud-native architectures. But modern monitoring solutions were designed for these modern environments and are the actual best way in which organizations can save time, effort and money, while empowering the entire IT team.

Chris Farrell is Observability and APM Strategist at Instana
Share this

The Latest

March 18, 2024

Gartner has highlighted the top trends that will impact technology providers in 2024: Generative AI (GenAI) is dominating the technical and product agenda of nearly every tech provider ...

March 15, 2024

In MEAN TIME TO INSIGHT Episode 4 - Part 1, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and network management ...

March 14, 2024

The integration and maintenance of AI-enabled Software as a Service (SaaS) applications have emerged as pivotal points in enterprise AI implementation strategies, offering both significant challenges and promising benefits. Despite the enthusiasm surrounding AI's potential impact, the reality of its implementation presents hurdles. Currently, over 90% of enterprises are grappling with limitations in integrating AI into their tech stack ...

March 13, 2024

In the intricate landscape of IT infrastructure, one critical component often relegated to the back burner is Active Directory (AD) forest recovery — an oversight with costly consequences ...

March 12, 2024

eBPF is a technology that allows users to run custom programs inside the Linux kernel, which changes the behavior of the kernel and makes execution up to 10x faster(link is external) and more efficient for key parts of what makes our computing lives work. That includes observability, networking and security ...

March 11, 2024

Data mesh, an increasingly important decentralized approach to data architecture and organizational design, focuses on treating data as a product, emphasizing domain-oriented data ownership, self-service tools and federated governance. The 2024 State of the Data Lakehouse report from Dremio presents evidence of the growing adoption of data mesh architectures in enterprises ... The report highlights that the drive towards data mesh is increasingly becoming a business strategy to enhance agility and speed in problem-solving and innovation ...

March 07, 2024
In this digital era, consumers prefer a seamless user experience, and here, the significance of performance testing cannot be overstated. Application performance testing is essential in ensuring that your software products, websites, or other related systems operate seamlessly under varying conditions. However, the cost of poor performance extends beyond technical glitches and slow load times; it can directly affect customer satisfaction and brand reputation. Understand the tangible and intangible consequences of poor application performance and how it can affect your business ...
March 06, 2024

Too much traffic can crash a website ... That stampede of traffic is even more horrifying when it's part of a malicious denial of service attack ... These attacks are becoming more common, more sophisticated and increasingly tied to ransomware-style demands. So it's no wonder that the threat of DDoS remains one of the many things that keep IT and marketing leaders up at night ...

March 05, 2024

Today, applications serve as the backbone of businesses, and therefore, ensuring optimal performance has never been more critical. This is where application performance monitoring (APM) emerges as an indispensable tool, empowering organizations to safeguard their applications proactively, match user expectations, and drive growth. But APM is not without its challenges. Choosing to implement APM is a path that's not easily realized, even if it offers great benefits. This blog deals with the potential hurdles that may manifest when you actualize your APM strategy in your IT application environment ...

March 04, 2024

This year's Super Bowl drew in viewership of nearly 124 million viewers and made history as the most-watched live broadcast event since the 1969 moon landing. To support this spike in viewership, streaming companies like YouTube TV, Hulu and Paramount+ began preparing their IT infrastructure months in advance to ensure an exceptional viewer experience without outages or major interruptions. New Relic conducted a survey to understand the importance of a seamless viewing experience and the impact of outages during major streaming events such as the Super Bowl ...