As businesses have become increasingly reliant on technology, monitoring applications and infrastructure is a necessity. Monitoring is a key component of IT management, helping detect anomalies, triage issues, and ensure that the entire infrastructure is healthy.
However, despite their importance, monitoring tools are often an afterthought, deployed after an IT infrastructure is in place and functioning. Without a planned and well-defined monitoring strategy in place, most IT organizations – large and small – find themselves caught in the trap of "too many monitoring tools" – custom in-house tools, open source tools, packaged tools, and more, that add up over time for a variety of reasons.
A recent survey by EMA indicated that 65% of enterprise organizations have more than 10 monitoring tools. These monitoring tools are, of course, not all unnecessary, but the real question is: Does your team need to manage so many monitoring tools? Does every nail require a different hammer? What are the potential consequences?
There are many reasons why enterprises end up having too many monitoring tools. This blog will examine why this occurs, how the situation gets out of hand, and some best practices to consolidate monitoring in a way that benefits all functions and efficiencies across an IT organization.
Monitoring Sprawl: How Did We Get Here?
So, often, a single IT service relies on many technologies and tiers. For example, a web service requires a web server, multiple middleware tiers, plus message queues and databases. It is hosted on a virtualized server and relies on data access from a storage tier. And since each of these technology tiers is very different from the others, all require specialized management skills. IT organizations tend to be structured along the lines of these tiers, and so there are many administrators, each using a different set of tools for his/her domains of expertise.
Even within a specific tier, multiple monitoring tools may be in use: One for monitoring performance, another for analyzing log files, yet another to report on traffic to that tier, and so on.
Further, when an organization frequently relies on short-term solutions to diagnose problems, ad hoc tool choices can lead to further sprawl. That is, when faced with a problem, an IT administrator may implement a new tool simply to solve the specific issue at hand, never to be used again, thus contributing to a growing collection of monitoring tool shelfware that consumes costs and personnel resources.
Another reason for monitoring tool sprawl is simply personal experience with a particular software solution. IT administrators and managers may have used a monitoring tool in past roles that they view as required for the job. Despite having one or more existing monitoring tools in place, the new tool gets implemented, rendering the existing solutions partially or completely redundant.
Inheritance and Bundles
Mergers and acquisitions can add to the software sprawl. Every time two organizations merge, the combined organization inherits monitoring tools from both organizations.
Many hardware purchases include proprietary monitoring software. With almost every storage vendor bundling its own monitoring tool, an organization leveraging storage arrays from multiple vendors can easily end up with a diverse group of storage monitoring tools.
And, software vendors sometimes package monitoring tools with their enterprise environments as well, so organizations that enter into these agreements can find themselves with yet another tool.
SaaS-Based Monitoring Options & Freeware
With the advent of quick-to-deploy SaaS-based monitoring tools, it has become very easy for organizations to keep adding them. SaaS-based helpdesks, monitoring tools, security tools, and more, can be easily purchased from operating budgets, so IT staff members can simply deploy their own open source and free tools, as needed. All of these add up to the overall number of monitoring tools the organization must maintain.
The Problem of Too Many Tools
Needle in the Haystack
Although each monitoring tool offers its own unique focus and strengths, overlap in functionality is extremely common. And, because there is no integration between these tools, in today's environment of many tiers and many monitoring tools, problem diagnosis – perhaps the most critical factor in fast remediation – is tedious and time-consuming. Administrators must first sift through alerts from disparate sources, eliminate duplicates, and then manually correlate reported performance issues to get actionable insights. Further complicating this process, analyzing alerts across tiers often requires a great deal of expertise, potentially adding more resources and more time.
For fast remediation in a multi-tier service delivery, problem diagnosis must be centralized and automated, but this cannot be achieved easily with multiple tools. Finding the needle in the haystack is difficult, but with what appear to be duplicate needles across many haystacks, it is easy to be led astray and waste valuable resources and time.
Of War Rooms and Blame Games
Most monitoring tools are designed for specific subject-matter experts (application, database, network, VDI, etc.). Without unified visibility into the IT environment, war room discussions can easily turn into finger-pointing: An application owner blames the network tier for slowness, a database administrator blames developers that have not used optimal queries, virtualization administrators point to the storage team, and so on.
Everyone believes it is "not my problem." But there is a problem somewhere, and without a single source of truth – a holistic view of service performance – no one can have visibility into what went wrong and where the fix is needed. So, additional time and effort is needed to manually correlate events and solve the problem, while the business and users suffer.
Time and Money
Maintaining a sprawl of monitoring tools adds cost, on many levels. There are hard costs with license renewals and maintenance, plus the time spent in support requests, working with the various vendors, deploying upgrades, and training personnel to handle multiple tools. All impact the total cost of ownership of these tools, with the cost of maintaining shelfware and redundant tools being the most extravagant of them all.
Seismic events can disrupt our focus and thinking and force reassessment of drivers of future business success. The current COVID-19 pandemic is one of those major events producing a worldwide impact, especially given its reverberations on the two largest global economies, the US and China, according to COVID-19 and Corporate Strategies in the US and China: A Seismic Event Demanding Change and Action from Top Executives, a new report commissioned by Wind River ...
On Wednesday, May 6th, iOS users all over the world experienced an app crash when they tried to open popular apps such as TikTok, GroupMe, Spotify, and Pinterest. How did simultaneous crashes occur across so many independent apps? What's the common thread that would cause widespread app crashes? Turns out, it was a change in behavior in the Facebook API ...
Keeping networks operational is critical for businesses to run smoothly. The Ponemon Institute estimates that the average cost of an unplanned network outage is $8,850 per minute, a staggering number. In addition to cost, a network failure has a negative effect on application efficiency and user experience ...
Nearly 3,700 people told GitLab about their DevOps journeys. Respondents shared that their roles are changing dramatically, no matter where they sit in the organization. The lines surrounding the traditional definitions of dev, sec, ops and test have blurred, and as we enter the second half of 2020, it is perhaps more important than ever for companies to understand how these roles are evolving ...
As cloud computing continues to grow, tech pros say they are increasingly prioritizing areas like hybrid infrastructure management, application performance management (APM), and security management to optimize delivery for the organizations they serve, according to SolarWinds IT Trends Report 2020: The Universal Language of IT ...
Businesses see digital experience as a growing priority and a key to their success, with execution requiring a more integrated approach across development, IT and business users, according to Digital Experiences: Where the Industry Stands ...
Fully 90% of those who use observability tooling say those tools are important to their team's software development success, including 39% who say observability tools are very important ...
As our production application systems continuously increase in complexity, the challenges of understanding, debugging, and improving them keep growing by orders of magnitude. The practice of Observability addresses both the social and the technological challenges of wrangling complexity and working toward achieving production excellence. New research shows how observable systems and practices are changing the APM landscape ...
The enforced change to working from home (WFH) has had a massive impact on businesses, not just in the way they manage their employees and IT systems. As the COVID-19 pandemic progresses, enterprise IT teams are looking to answer key questions such as: Which applications have become more critical for working from home? ...