Bringing Alert Management into the Present with Advanced Analytics
March 25, 2015

Kevin Conklin
Ipswitch

Share this

We have smart cars on the horizon that will navigate themselves. Mobile apps that make communication, navigation and entertainment an integral part of our daily lives. Your insurance pricing may soon be affected by whether or not you wear a personal health monitoring device. Everywhere you turn, the very latest IT technologies are being leveraged to provide advanced services that were unimaginable even ten years ago. So why is it that the IT environments that provide these services are managed using an analytics technology designed for the 1970s?

The IT landscape has evolved significantly over the past few decades. IT management simply has not kept pace. IT operations teams are anxious that too many problems are reported first by end users. Support teams worry that too many people spend too much time troubleshooting. Over 70 percent of troubleshooting time is actually wasted following false hunches because alerts provide no value to the diagnostic process. Enterprises that are still reliant on yesterday’s management strategies will find it increasingly difficult to solve today’s operations and performance management challenges.

This is not just an issue of falling behind a technology curve. There is a real business impact in increasing incident rates, failing to detect potentially disastrous outages and human resources wasting valuable time. An increasing number of IT shops are anxiously searching for alternatives.

This is where advanced machine learning analytics can help.

Too often operations teams can become engulfed by alerts – getting tens of thousands a day and not knowing which to deal with and when, making it quite possible that something important was ignored while time was wasted on something trivial. Through a powerful combination of machine learning and anomaly detection, advanced analytics can reduce the alarms to a prioritized set that have the largest impact on the environment. By learning which alerts are “normal”, these systems define an operable status quo. In essence, machine learning filters out the “background noise” of alerts that, based on their persistence, have no effect on normal operations. From there, statistical algorithms identify and rank “abnormal” outliers on a scale measuring severity (value of a spike or drop occurrence), rarity (number of previous instances) or impact (quantity of related anomalies). The result is a reduction from hundreds of thousands of noisy alerts a week to a few dozen notifications of real problems.

Despite producing huge volumes of alerts, rules and thresholds implementations often miss problems or report them long after the customer has experienced the impact. The fear of generating even more alerts forces monitoring teams to select fewer KPIs, thus decreasing the likelihood of detection. Problems that slowly approach thresholds go unnoticed until user experience is already impacted. Adopting this advanced analytics approach empowers enterprises to not only identify problems that rules and thresholds miss or simply execute against too late, but also provide their troubleshooting teams with pre-correlated causal data.

By replacing legacy rules and thresholds with machine learning anomaly detection, IT teams can monitor larger sets of performance data in real-time. Monitoring more KPIs enable a higher percentage of issues to be detected before the users report them. Through real-time cross correlation, related anomalies are detected and alerts become more actionable. Early adopters report that they are able to reduce troubleshooting time by 75 percent, with commensurate reductions in the number of people involved by as much as 85 percent.

Advanced machine learning systems will fundamentally change the way data is converted into information over the next few years. If your business is leveraging information to provide competitive services, you can’t afford to be the laggard.

Kevin Conklin is VP of Product Marketing at Ipswitch
Share this

The Latest

July 24, 2017

Optimizing online web performance is critical to keep and convert customers and achieve success for the holidays and the entire retail year. Recent research from Akamai indicates that website slowdowns as small as 100 milliseconds can significantly impact revenues ...

July 21, 2017

Public sector organizations undergoing digital transformation are losing confidence in IT Operations' ability to manage the influx of new technologies and evolving expectations, according to the 2017 Splunk Public Sector IT Operations Survey ...

July 20, 2017

It's no surprise that web application quality is incredibly important for businesses; 99 percent of those surveyed by Sencha are in agreement. But despite technological advances in testing, including automation, problems with web application quality remain an issue for most businesses ...

July 19, 2017

Market hype and growing interest in artificial intelligence (AI) are pushing established software vendors to introduce AI into their product strategy, creating considerable confusion in the process, according to Gartner. Analysts predict that by 2020, AI technologies will be virtually pervasive in almost every new software product and service ...

July 18, 2017

Organizations are encountering user, revenue or customer-impacting digital performance problems once every five days, according a new study by Dynatrace. Furthermore, the study reveals that individuals are losing a quarter of their working lives battling to address these problems ...

July 17, 2017
Mobile devices account for more than 60 percent of all digital minutes in all 9 markets profiled in comScore's report: Mobile’s Hierarchy of Needs ...
July 14, 2017

Cloud adoption is still the most vexing factor in increased network complexity, ahead of the internet of things (IoT), software-defined networking (SDN), and network functions virtualization (NFV), according to a new survey conducted by Kentik ...

July 13, 2017

Gigabit speeds and new technologies are driving new capabilities and even more opportunities to innovate and differentiate. Faster compute, new applications and more storage are all working together to enable greater efficiency and greater power. Yet with opportunity comes complexity ...

July 12, 2017

Achieving broad competence in event-driven IT will be a top three priority for the majority of global enterprise CIOs by 2020, according to Gartner, Inc. Defining an event-centric digital business strategy will be key to delivering on the growth agenda that many CEOs see as their highest business priority ...

July 11, 2017

It's not especially surprising that a new IT survey shows that cloud use for business and government poses challenges. In significant numbers across the board, respondents cited cloud complexity, compliance and security, cost control, speed of delivery, and domain expertise as the cloud problems their organizations were working to overcome this year ...