How to Detect (and Resolve) IT Ops/APM Issues Before Your Users Do
September 19, 2014

Kevin Conklin
Ipswitch

Share this

Among the most embarrassing situations for application support teams is first hearing about a critical performance issue from their users. With technology getting increasingly complex and IT environments changing almost overnight, the reality is that even the most experienced support teams are bound to miss a major problem with a critical application or service. One of the contributing factors is their continued reliance on traditional monitoring approaches.

Traditional tools limit us to monitoring for a combination of key performance indicator thresholds and failure modes that have already been experienced. So when it comes to finding new problems, the best case is alerts that describe the symptom (slow response time, transaction fails, etc.). A very experienced IT professional will have seen many behaviors, and consequently can employ monitoring based on best practices and past experiences. But even the most experienced IT professional will have a hard time designing rules and thresholds that can monitor for new, unknown problems without generating a number of noisy false alerts. Anomaly detection goes beyond the limits of traditional approaches because it sees and learns everything in the data provided, whether it has happened before or not.

Anomaly detection works by identifying unusual behaviors in data generated by an application or service delivery environment. The technology uses machine learning predictive analytics to establish baselines in the data and automatically learn what normal behavior is. The technology then identifies deviations in behavior that are unusually severe or maybe causal to other anomalies – a clear indication that something is wrong. And the best part? This technology works in real-time as well as in troubleshooting mode, so it's proactively monitoring your IT environment. With this approach, real problems can be identified and acted upon faster than before.

More advanced anomaly detection technologies can run multiple analyses in parallel, and are capable of analyzing multiple data sources simultaneously, identifying related, anomalous relationships within the system. Thus, when a chain of events is causal to a performance issue, the alerts contain all the related anomalies. This helps support teams zero in on the cause of the problem immediately.

Traditional approaches are also known to generate huge volumes of false alerts. Anomaly detection, on the other hand, uses advanced statistical analyses to minimize false alerts. Those few alerts that are generated provide more data, which results in faster troubleshooting.

Anomaly detection looks for significant variations from the norm and ranks severity by probability. Machine learning technology helps the system learn the difference between commonly occurring errors as well as spikes and drops in metrics, and true anomalies that are more accurate indicators of a problem. This can mean the difference between tens of thousands of alerts each day, most of which are false, and a dozen or so a week that should be pursued.

Anomaly detection can identify the early signs of developing problems in massive volumes of data before they turn into real, big problems. Enabling IT teams to slash troubleshooting time and decrease the noise from false alarms empowers them to attack and resolve any issues before they reach critical proportions.

If users do become aware of a problem, the IT team can respond "we're on it" instead of saying "thanks for letting us know."

Kevin Conklin is VP of Product Marketing at Ipswitch
Share this

The Latest

February 23, 2018

With 2017 behind us, the news cycle is still stirring up stories on artificial intelligence (AI) and machine learning (ML), but has some of the excitement worn off? We're witnessing a surge of activity in the space. Can actual examples of AI in the enterprise rise among some of the noise that's inundating the market and hindering the credibility of everyone? ...

February 22, 2018

Everyone wants to talk about how analytics is the future of network engineering and operations. The phrase "network analytics" is used by vendors of various stripes to imply that a particular technology is smarter and better than the average solution. But what is it? What does the term network analytics mean to the enterprise network infrastructure professionals? ...

February 21, 2018

Three out of four (76%) of organizations think IT complexity could soon make it impossible to manage digital performance efficiently, according to the Top Challenges Facing CIOs in a Cloud-Native World report from Dynatrace ...

February 20, 2018

The Global CIO Point of View report compiled by ServiceNow notes that 89 percent of organizations are either in the planning stages or are already taking advantage of machine learning. Nearly 90 percent of the CIOs surveyed anticipate that increasing automation will increase the speed and accuracy of decisions, and more than two-thirds believe that decisions made by machines will be more accurate than human-made decisions ...

February 16, 2018

The enterprise WAN is unable to keep up with digital transformation demands, according to Foundation for Digital Transformation, a new research report, authored by Ensemble IQ and supported by InfoVista. This challenge was universal across all three vertical industries surveyed — retail, manufacturing, and banking/financial services ...

February 15, 2018

Achieving optimum Java Virtual Machine (JVM) performance is key to ensuring proper memory management and fast application processing. According to a Cornell University study, a 1-millisecond improvement in the performance of a trading application can be worth $100 million a year to a major brokerage firm. Because of this potential for loss, IT teams owning banking, financial, trading and other Java-based applications place a high value on having a proper JVM monitoring strategy in place ...

February 13, 2018

APM had to evolve to keep pace with development velocity and maintain the service quality for the modern applications born out of digital transformation. Automation and artificial intelligence (AI) technologies are critical to the next step in APM evolution, helping to address speed, scalability and intelligence demands ...

February 12, 2018

A worldwide survey by Gartner, Inc. showed that 91 percent of organizations have not yet reached a "transformational" level of maturity in data and analytics, despite this area being a number one investment priority for CIOs in recent years ...

February 09, 2018

Mobile app performance is still a significant issue. In a new report from PacketZoom, The Effect of Mobile Network Performance on Mobile App Users, 66% of consumers said reliable mobile app performance is "very important" — second only to mobile app security ...