APM The Power of AIOps and Smart Alarms in Alert Management
January 23, 2019

Brena Monteiro
Fixate IO

One key benefit of AIOps is improving alert management with the help of "smart alarms." This blog explains what a smart alarm is and how AIOps enables it.

What is a Smart Alarm?

Traditionally, IT teams define alert thresholds and categories manually, then wait for their configurations to trigger an alarm.

In contrast, a smart alarm is one that is configured automatically, with the help of machine learning. Instead of relying on human engineers to decide when an alarm should sound in response to a given event, smart alarms use data and analytics to make the configuration automatically.

Not only does this approach save time on the part of human admins, but it also enables alert thresholds and conditions to be adjusted dynamically, in real time. That’s a critical advantage in today’s fast-changing environments, where a level of, say, network traffic or CPU load that is normal one minute might signal a problem the next.

How AIOps Enables Smart Alarms

AIOps, which uses artificial intelligence and machine learning to improve IT operations, is the foundation of smart alarms. To set up a smart alarm, you start with a tool that enables AIOps.

One such AIOps tool is CA Digital Experience Insights, which offers data analytics with machine learning to elevate infrastructure with artificial intelligence. With it, monitoring an application is not about simply staring at a screen and waiting for alerts to appear. You can instead predict errors by the amount of access and the expected flow — and the smart alarm lets you see issues before they occur.

How to Avoid Redundant Alerts

Grouping alerts of the same error type is useful to decrease what is often a flood of unnecessary notifications. For example, an error that has occurred in the database will be replicated in the backend application and will probably result in an HTTP error in the frontend application. If not configured correctly, at least three alerts will be generated when just one is sufficient. The trackback offered in some AIOps tools can help to map and group an error independent of the component that raises the alert.

As an example, with CA AIOps, we can configure alerts within App Experience Analytics and group them by HTTP status, crashes, etc. to reduce the number of alarms.


Click on image above for larger version

Creating Smart Alerts

A good smart alert has to have its limits defined by AIOps tools, because manual limits are imprecise, arbitrary, and not adequate for specific scenarios. We need to collect data before setting up a smart alarm to aid AIOps.

The creation of crash alerts in App Experience Analytics includes a field for setting the crash type or providing a description of the crash that will be used to return data searched for in StackOverflow (a great help to start the search for a rapid fix). An alert will be created only if you include an email address for notification when a crash happens. (This is important to restrict the number of irrelevant alarms.)


Conclusion

With AIOps and smart alarms, it’s no longer necessary to pour huge amounts of time into configuring alerts, only to have the configurations become outdated as the environment changes. Although it may take some time to perform the basic setup required to prepare an AIOps tool to manage smart alarms, once you invest that effort, your overall monitoring time commitment and effectiveness will improve significantly.

You can learn more about AIOps and smart alarms in The Definitive Guide to AIOps provided by CA Technologies and Sweetcode.io, which is available for free here.

Brena Monteiro is a Fixate IO Contributor and a Software Engineer
Share this