As businesses have become increasingly reliant on technology, monitoring applications and infrastructure is a necessity. Monitoring is a key component of IT management, helping detect anomalies, triage issues, and ensure that the entire infrastructure is healthy.
However, despite their importance, monitoring tools are often an afterthought, deployed after an IT infrastructure is in place and functioning. Without a planned and well-defined monitoring strategy in place, most IT organizations – large and small – find themselves caught in the trap of "too many monitoring tools" – custom in-house tools, open source tools, packaged tools, and more, that add up over time for a variety of reasons.
A recent survey by EMA indicated that 65% of enterprise organizations have more than 10 monitoring tools. These monitoring tools are, of course, not all unnecessary, but the real question is: Does your team need to manage so many monitoring tools? Does every nail require a different hammer? What are the potential consequences?
There are many reasons why enterprises end up having too many monitoring tools. This blog will examine why this occurs, how the situation gets out of hand, and some best practices to consolidate monitoring in a way that benefits all functions and efficiencies across an IT organization.
Monitoring Sprawl: How Did We Get Here?
So, often, a single IT service relies on many technologies and tiers. For example, a web service requires a web server, multiple middleware tiers, plus message queues and databases. It is hosted on a virtualized server and relies on data access from a storage tier. And since each of these technology tiers is very different from the others, all require specialized management skills. IT organizations tend to be structured along the lines of these tiers, and so there are many administrators, each using a different set of tools for his/her domains of expertise.
Even within a specific tier, multiple monitoring tools may be in use: One for monitoring performance, another for analyzing log files, yet another to report on traffic to that tier, and so on.
Further, when an organization frequently relies on short-term solutions to diagnose problems, ad hoc tool choices can lead to further sprawl. That is, when faced with a problem, an IT administrator may implement a new tool simply to solve the specific issue at hand, never to be used again, thus contributing to a growing collection of monitoring tool shelfware that consumes costs and personnel resources.
Another reason for monitoring tool sprawl is simply personal experience with a particular software solution. IT administrators and managers may have used a monitoring tool in past roles that they view as required for the job. Despite having one or more existing monitoring tools in place, the new tool gets implemented, rendering the existing solutions partially or completely redundant.
Inheritance and Bundles
Mergers and acquisitions can add to the software sprawl. Every time two organizations merge, the combined organization inherits monitoring tools from both organizations.
Many hardware purchases include proprietary monitoring software. With almost every storage vendor bundling its own monitoring tool, an organization leveraging storage arrays from multiple vendors can easily end up with a diverse group of storage monitoring tools.
And, software vendors sometimes package monitoring tools with their enterprise environments as well, so organizations that enter into these agreements can find themselves with yet another tool.
SaaS-Based Monitoring Options & Freeware
With the advent of quick-to-deploy SaaS-based monitoring tools, it has become very easy for organizations to keep adding them. SaaS-based helpdesks, monitoring tools, security tools, and more, can be easily purchased from operating budgets, so IT staff members can simply deploy their own open source and free tools, as needed. All of these add up to the overall number of monitoring tools the organization must maintain.
The Problem of Too Many Tools
Needle in the Haystack
Although each monitoring tool offers its own unique focus and strengths, overlap in functionality is extremely common. And, because there is no integration between these tools, in today's environment of many tiers and many monitoring tools, problem diagnosis – perhaps the most critical factor in fast remediation – is tedious and time-consuming. Administrators must first sift through alerts from disparate sources, eliminate duplicates, and then manually correlate reported performance issues to get actionable insights. Further complicating this process, analyzing alerts across tiers often requires a great deal of expertise, potentially adding more resources and more time.
For fast remediation in a multi-tier service delivery, problem diagnosis must be centralized and automated, but this cannot be achieved easily with multiple tools. Finding the needle in the haystack is difficult, but with what appear to be duplicate needles across many haystacks, it is easy to be led astray and waste valuable resources and time.
Of War Rooms and Blame Games
Most monitoring tools are designed for specific subject-matter experts (application, database, network, VDI, etc.). Without unified visibility into the IT environment, war room discussions can easily turn into finger-pointing: An application owner blames the network tier for slowness, a database administrator blames developers that have not used optimal queries, virtualization administrators point to the storage team, and so on.
Everyone believes it is "not my problem." But there is a problem somewhere, and without a single source of truth – a holistic view of service performance – no one can have visibility into what went wrong and where the fix is needed. So, additional time and effort is needed to manually correlate events and solve the problem, while the business and users suffer.
Time and Money
Maintaining a sprawl of monitoring tools adds cost, on many levels. There are hard costs with license renewals and maintenance, plus the time spent in support requests, working with the various vendors, deploying upgrades, and training personnel to handle multiple tools. All impact the total cost of ownership of these tools, with the cost of maintaining shelfware and redundant tools being the most extravagant of them all.
Unexpected and unintentional drops in network quality, so-called network brownouts, cause serious financial damage and frustrate employees. A recent survey sponsored by Netrounds reveals that more than 60% of network brownouts are first discovered by IT’s internal and external customers, or never even reported, instead of being proactively detected by IT organizations ...
Digital transformation reaches into every aspect of our work and personal lives, to the point that there is an automatic expectation of 24/7, anywhere availability regarding any organization with an online presence. This environment is ripe for artificial intelligence, so it's no surprise that IT Operations has been an early adopter of AI ...
A brief introduction to Applications Performance Monitoring (APM), breaking it down to a few key points, followed by a few important lessons which I have learned over the years ...
Research conducted by ServiceNow shows that Gen Zs, now entering the workforce, recognize the promise of technology to improve work experiences, are eager to learn from other generations, and believe they can help older generations be more open‑minded ...
We're in the middle of a technology and connectivity revolution, giving us access to infinite digital tools and technologies. Is this multitude of technology solutions empowering us to do our best work, or getting in our way? ...
Microservices have become the go-to architectural standard in modern distributed systems. While there are plenty of tools and techniques to architect, manage, and automate the deployment of such distributed systems, issues during troubleshooting still happen at the individual service level, thereby prolonging the time taken to resolve an outage ...
A recent APMdigest blog by Jean Tunis provided an excellent background on Application Performance Monitoring (APM) and what it does. A further topic that I wanted to touch on though is the need for good quality data. If you are to get the most out of your APM solution possible, you will need to feed it with the best quality data ...
Humans and manual processes can no longer keep pace with network innovation, evolution, complexity, and change. That's why we're hearing more about self-driving networks, self-healing networks, intent-based networking, and other concepts. These approaches collectively belong to a growing focus area called AIOps, which aims to apply automation, AI and ML to support modern network operations ...
IT outages happen to companies across the globe, regardless of location, annual revenue or size. Even the most mammoth companies are at risk of downtime. Increasingly over the past few years, high-profile IT outages — defined as when the services or systems a business provides suddenly become unavailable — have ended up splashed across national news headlines ...
APM tools are ideal for an application owner or a line of business owner to track the performance of their key applications. But these tools have broader applicability to different stakeholders in an organization. In this blog, we will review the teams and functional departments that can make use of an APM tool and how they could put it to work ...