How to Detect (and Resolve) IT Ops/APM Issues Before Your Users Do
September 19, 2014

Kevin Conklin
Ipswitch

Share this

Among the most embarrassing situations for application support teams is first hearing about a critical performance issue from their users. With technology getting increasingly complex and IT environments changing almost overnight, the reality is that even the most experienced support teams are bound to miss a major problem with a critical application or service. One of the contributing factors is their continued reliance on traditional monitoring approaches.

Traditional tools limit us to monitoring for a combination of key performance indicator thresholds and failure modes that have already been experienced. So when it comes to finding new problems, the best case is alerts that describe the symptom (slow response time, transaction fails, etc.). A very experienced IT professional will have seen many behaviors, and consequently can employ monitoring based on best practices and past experiences. But even the most experienced IT professional will have a hard time designing rules and thresholds that can monitor for new, unknown problems without generating a number of noisy false alerts. Anomaly detection goes beyond the limits of traditional approaches because it sees and learns everything in the data provided, whether it has happened before or not.

Anomaly detection works by identifying unusual behaviors in data generated by an application or service delivery environment. The technology uses machine learning predictive analytics to establish baselines in the data and automatically learn what normal behavior is. The technology then identifies deviations in behavior that are unusually severe or maybe causal to other anomalies – a clear indication that something is wrong. And the best part? This technology works in real-time as well as in troubleshooting mode, so it's proactively monitoring your IT environment. With this approach, real problems can be identified and acted upon faster than before.

More advanced anomaly detection technologies can run multiple analyses in parallel, and are capable of analyzing multiple data sources simultaneously, identifying related, anomalous relationships within the system. Thus, when a chain of events is causal to a performance issue, the alerts contain all the related anomalies. This helps support teams zero in on the cause of the problem immediately.

Traditional approaches are also known to generate huge volumes of false alerts. Anomaly detection, on the other hand, uses advanced statistical analyses to minimize false alerts. Those few alerts that are generated provide more data, which results in faster troubleshooting.

Anomaly detection looks for significant variations from the norm and ranks severity by probability. Machine learning technology helps the system learn the difference between commonly occurring errors as well as spikes and drops in metrics, and true anomalies that are more accurate indicators of a problem. This can mean the difference between tens of thousands of alerts each day, most of which are false, and a dozen or so a week that should be pursued.

Anomaly detection can identify the early signs of developing problems in massive volumes of data before they turn into real, big problems. Enabling IT teams to slash troubleshooting time and decrease the noise from false alarms empowers them to attack and resolve any issues before they reach critical proportions.

If users do become aware of a problem, the IT team can respond "we're on it" instead of saying "thanks for letting us know."

Kevin Conklin is VP of Product Marketing at Ipswitch
Share this

The Latest

January 23, 2020

EMA is about to embark on some new research entitled Data-Driven Automation: A Vision for the Modern CIO. We're trying to piece a puzzle together that so far we don't believe anyone to date has fully done — seek out where and how IT is moving toward integrated strategies for automation in context with real-world objectives and obstacles. We'll be looking at four use cases, each of will no doubt tell its own story ...

January 22, 2020

Many pitfalls await CIOs on the journey to the cloud. In fact, a majority of companies have been only partially successful, while some are outright failing. To learn more about this migration, Business Performance Innovation (BPI) Network surveyed IT and business executives and conducted in-depth interviews ...

January 21, 2020

The online retail industry has yet to have a Black Friday/Cyber Monday weekend unscathed by web performance (speed and availability) problems. Luckily, performance during 2019's hyper-critical online holiday shopping weekend was better than in years past, as we did not see any systemic, lengthy outages. While no website went completely down, several retailers did experience significant problems. Why have online retailers yet to figure out how to be crash-free during this all-important peak traffic period? We've identified several reasons for this ...

January 16, 2020

Gartner highlighted the trends that infrastructure and operations (I&O) leaders must start preparing for to support digital infrastructure in 2020 ...

January 15, 2020

Edge computing usage is starting to increase. The obvious follow-up question is, "So, what can I do with edge computing?" I'm glad you asked. There are lots of things you can do ...

January 14, 2020

Industry experts offer predictions on how Network Performance Management (NPM) and related technologies will evolve and impact business in 2020. Part 2 offers predictions about 5G and more ...

January 13, 2020

Industry experts offer predictions on how Network Performance Management (NPM) and related technologies will evolve and impact business in 2020 ...

January 09, 2020

With AI on the edge, companies will more easily monitor desktops, tablets and other end-user devices. AIOps will enable IT to guide employees on improving productivity from the applications installed on their devices while delivering greater visibility and control around the entire IT environment ...

January 08, 2020

2020 will see AIOps adoption going mainstream as use cases crystallize for improving IT efficiencies and supporting faster decision-making. Expect AI-enhanced automation to become smarter and more contextual, move towards the edge, and used increasingly for customer and user experience analysis. Yet there are significant challenges and cautions, which will shape AI's development in not only IT but across business and society ...

January 07, 2020

Industry experts offer predictions on how Digital Transformation will evolve and impact business in 2020 ...