Skip to main content

Discovering AIOps - Part 4: Advantages

Pete Goldin
Editor and Publisher
APMdigest

The reason AIOps has become well-known in recent years is because it offers a variety of very compelling advantages. Part 4 of this blog series covers the expert picks for the greatest advantages that can be gained from AIOps.

Start with: Discovering AIOps - Part 1

Start with: Discovering AIOps - Part 2: Must-Have Capabilities

Start with: Discovering AIOps - Part 3: The Users

Better IT Decision Making

One overarching advantage of AIOps is support for better IT decision-making, with regard to system, network and application performance — an advantage that impacts many of the other advantages on this list.

"A shared goal across IT teams is to gain insights leveraging all available monitoring data," explains Andreas Reiss, Head of Product Management, AIOps and Observability, at Broadcom. "AIOps helps correlate the data and enhances it with meaningful context that informs the enterprise what it doesn't know yet about its environment and possible problem scenarios that might affect the business and end users. These kinds of insights are the goal of AIOps."

AIOps equips organizations with agile insights into their IT ecosystem, according to Payal Kindiger, Senior Director of Product Marketing at Riverbed. By correlating data from various sources, it provides a comprehensive overview of system health, performance trends, and anomalies. This enables quicker and more informed decision-making, supporting agile responses to changing business needs and dynamic IT environments.

Image removed.

Gagan Singh, VP of Product Marketing, Observability, at Elastic, says, "AIOps provides actionable insights by using machine learning algorithms and natural language processing to analyze large volumes of telemetry data (logs, metrics, and traces) from several systems to identify patterns, correlate events, generate predictions about future events, and discover the root cause."

Handling Big Data

Companies today average more than 200 applications, many of which are outside of IT control. As a result, companies are drowning in data, much of which is siloed between departments. AIOps offers the ability to consume large, disparate data sets such as these and help bring order from chaos, says Thomas LaRock, Principal Developer Evangelist at Selector.

"AIOps will monitor one or more big data sources — let's say databases containing software and infrastructure telemetry like metrics, traces, logs, and events — and use machine learning algorithms to find deviations from 'normal' nearly in real time, essentially highlighting these deviations as they are happening or very shortly after," adds Camden Swita, Senior Product Manager at New Relic.

Identifying Anomalies

"AIOps platforms leverage advanced machine learning algorithms to analyze vast volumes of data from various sources. This enables AIOps platforms to identify patterns, anomalies, and correlations within this data. Such patterns and anomalies are usually undetectable by IT admins manually combing through data," says Bharani Kumar Kulasekaran, Product Manager at ManageEngine.

Image removed.

Pinpointing the Root Cause

"Recent data shows that organizations are especially seeing the positive influence of AIOps, with AIOps tools outperforming legacy solutions in a number of ways, such as automatically determining the technical root cause of an issue and better assessing the severity of an issue," says Spiros Xanthos, SVP and General Manager of Observability at Splunk.

Problem isolation is a great example where AIOps can make a big impact by acting to discover a root cause, such as by generating metrics along the graph of relationships between components on a transaction trace, adds Asaf Yigal, CTO of Logz.io.

Predicting Future Issues

AIOps platforms offer forecasting capabilities, which enable them to predict potential future issues based on existing data and trends. This helps organizations take a proactive approach to preventing downtime, Bharani Kumar Kulasekaran, Product Manager at ManageEngine explains.

Accelerating MTTR

Yigal from Logz.io explains, "If you can tell the analyst, this is mega important based on previous investigations in your environment or people have seen this thing a million times and marked it as low-priority, you are having a massive impact. It's really hard to drive down MTTR if you don't have the right contextual information to help with the investigation. However, it's even harder to drive down MTTR if your people are occupying their time focused on the wrong activities."

In the event of an issue, AIOps aids faster Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD), according to Ali Siddiqui, Chief Product Officer at BMC.

Faster Ticket Remediation

The use of AIOps capabilities in ITSM can speed up or automate ticket remediation, Yigal from Logz.io advises.

Kindiger from Riverbed adds that intelligent ticketing, where unified observability solutions leverage AIOps capabilities, can offer user-first contextualized insights directly within ITSM tickets. This eliminates the need for escalations to specialized resources while empowering L1 service desk agents to swiftly resolve issues with the required information at their fingertips. Additionally, combined with automation, tickets can be created, prioritized, routed and resolved before issues are raised as incidents.

Minimizing Alert Noise

Traditional monitoring tools rely on static thresholds to generate alerts. This coupled with the quest for monitoring everything results in a flood of alerts — false positives, FYI notifications, benign status changes, and duplicates, according to Monika Bhave, Product Manager at Digitate. Sifting through this noise, which can amount to thousands of alerts each day across several different platforms, is a massive undertaking. This can cause enterprises to miss major events, leading to potentially catastrophic outages.

Prior to AIOps, the only tools for dealing with the ineffective and noisy alerting that was clouding IT operations processes were blunt ones: remove non-essential alerting, raise thresholds to prevent false positives, and encode complex logic to handle the alerts, explains Charles Burnham, Director, AIOps Engineering at LogicMonitor. The effect of this sledgehammer approach however, was desensitized and incomplete monitoring that missed outages and hid preventative opportunities and with logic that was often too complex to maintain.

"One of the transformative aspects of AIOps is that, for the first time, an enterprise can cut through the noise that can overwhelm IT operations to identify meaningful signals embedded in monitoring data," says Reiss from Broadcom.

"AIOps can automatically categorize alerts based on their severity and importance, ensuring that only critical alerts are prioritized and addressed in a timely manner," adds Singh from Elastic.

Carlos Casanova, Principal Analyst at Forrester Research, explains further, "The AIOps tools can ingest vast amounts of disparate data and perform multivariate analysis at incredible speeds. AIOps quickly identifies volumes of alerts and if it recognizes that they're all traced back to a common point, it can collapse all of them into a drastically reduced number of items for the operator to deal with. This could easily be hundreds of items reduced to one or two. So, the work queues are significantly less and more important, less time is wasted sorting through the haystack for the needle."

Phillip Carter, Principal Product Manager at Honeycomb, elaborates, "AIOps reduces alert noise in several ways. More broadly speaking, ML models that can tell you what's worth paying attention to in your dumping ground of data will reduce the number of alerts if you were alerting on all of that data. They can also look at historical data on alert resolution and deprioritize alerts that are traditionally false positives."

Swita from New Relic adds, "Alerts age, and some age poorly. If that alert condition you wrote last year or even last month was based on some snapshotted 'norm,' who's to say it's still relevant or worthy of generating an alert today? However, a machine learning algorithm can adjust to the changing norm of a system and adjust alert conditions accordingly (assuming that's what you want), which can often reduce alert frequency and false positives."

"However, it's worth noting that just because an alert was a false positive before doesn't mean it will continue to be a false positive in the future, especially for services with irregular traffic patterns," cautions Carter from Honeycomb.

Go to: Discovering AIOps - Part 5, covering more advantages of AIOps.

Pete Goldin is Editor and Publisher of APMdigest

Hot Topics

The Latest

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 5 covers the infrastructure and hardware supporting AI ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 4 covers advancements in AI technology ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 3 covers AI's impact on employees and their roles ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 2 covers the challenges presented by AI, as well as solutions to those problems ...

In the final part of APMdigest's 2025 Predictions Series, industry experts offer predictions on how AI will evolve and impact technology and business in 2025 ...

E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...

Efficiency is a highly-desirable objective in business ... We're seeing this scenario play out in enterprises around the world as they continue to struggle with infrastructures and remote work models with an eye toward operational efficiencies. In contrast to that goal, a recent Broadcom survey of global IT and network professionals found widespread adoption of these strategies is making the network more complex and hampering observability, leading to uptime, performance and security issues. Let's look more closely at these challenges ...

Image
Broadcom

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Image
Catchpoint

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Image
Datastrike

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...

Discovering AIOps - Part 4: Advantages

Pete Goldin
Editor and Publisher
APMdigest

The reason AIOps has become well-known in recent years is because it offers a variety of very compelling advantages. Part 4 of this blog series covers the expert picks for the greatest advantages that can be gained from AIOps.

Start with: Discovering AIOps - Part 1

Start with: Discovering AIOps - Part 2: Must-Have Capabilities

Start with: Discovering AIOps - Part 3: The Users

Better IT Decision Making

One overarching advantage of AIOps is support for better IT decision-making, with regard to system, network and application performance — an advantage that impacts many of the other advantages on this list.

"A shared goal across IT teams is to gain insights leveraging all available monitoring data," explains Andreas Reiss, Head of Product Management, AIOps and Observability, at Broadcom. "AIOps helps correlate the data and enhances it with meaningful context that informs the enterprise what it doesn't know yet about its environment and possible problem scenarios that might affect the business and end users. These kinds of insights are the goal of AIOps."

AIOps equips organizations with agile insights into their IT ecosystem, according to Payal Kindiger, Senior Director of Product Marketing at Riverbed. By correlating data from various sources, it provides a comprehensive overview of system health, performance trends, and anomalies. This enables quicker and more informed decision-making, supporting agile responses to changing business needs and dynamic IT environments.

Image removed.

Gagan Singh, VP of Product Marketing, Observability, at Elastic, says, "AIOps provides actionable insights by using machine learning algorithms and natural language processing to analyze large volumes of telemetry data (logs, metrics, and traces) from several systems to identify patterns, correlate events, generate predictions about future events, and discover the root cause."

Handling Big Data

Companies today average more than 200 applications, many of which are outside of IT control. As a result, companies are drowning in data, much of which is siloed between departments. AIOps offers the ability to consume large, disparate data sets such as these and help bring order from chaos, says Thomas LaRock, Principal Developer Evangelist at Selector.

"AIOps will monitor one or more big data sources — let's say databases containing software and infrastructure telemetry like metrics, traces, logs, and events — and use machine learning algorithms to find deviations from 'normal' nearly in real time, essentially highlighting these deviations as they are happening or very shortly after," adds Camden Swita, Senior Product Manager at New Relic.

Identifying Anomalies

"AIOps platforms leverage advanced machine learning algorithms to analyze vast volumes of data from various sources. This enables AIOps platforms to identify patterns, anomalies, and correlations within this data. Such patterns and anomalies are usually undetectable by IT admins manually combing through data," says Bharani Kumar Kulasekaran, Product Manager at ManageEngine.

Image removed.

Pinpointing the Root Cause

"Recent data shows that organizations are especially seeing the positive influence of AIOps, with AIOps tools outperforming legacy solutions in a number of ways, such as automatically determining the technical root cause of an issue and better assessing the severity of an issue," says Spiros Xanthos, SVP and General Manager of Observability at Splunk.

Problem isolation is a great example where AIOps can make a big impact by acting to discover a root cause, such as by generating metrics along the graph of relationships between components on a transaction trace, adds Asaf Yigal, CTO of Logz.io.

Predicting Future Issues

AIOps platforms offer forecasting capabilities, which enable them to predict potential future issues based on existing data and trends. This helps organizations take a proactive approach to preventing downtime, Bharani Kumar Kulasekaran, Product Manager at ManageEngine explains.

Accelerating MTTR

Yigal from Logz.io explains, "If you can tell the analyst, this is mega important based on previous investigations in your environment or people have seen this thing a million times and marked it as low-priority, you are having a massive impact. It's really hard to drive down MTTR if you don't have the right contextual information to help with the investigation. However, it's even harder to drive down MTTR if your people are occupying their time focused on the wrong activities."

In the event of an issue, AIOps aids faster Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD), according to Ali Siddiqui, Chief Product Officer at BMC.

Faster Ticket Remediation

The use of AIOps capabilities in ITSM can speed up or automate ticket remediation, Yigal from Logz.io advises.

Kindiger from Riverbed adds that intelligent ticketing, where unified observability solutions leverage AIOps capabilities, can offer user-first contextualized insights directly within ITSM tickets. This eliminates the need for escalations to specialized resources while empowering L1 service desk agents to swiftly resolve issues with the required information at their fingertips. Additionally, combined with automation, tickets can be created, prioritized, routed and resolved before issues are raised as incidents.

Minimizing Alert Noise

Traditional monitoring tools rely on static thresholds to generate alerts. This coupled with the quest for monitoring everything results in a flood of alerts — false positives, FYI notifications, benign status changes, and duplicates, according to Monika Bhave, Product Manager at Digitate. Sifting through this noise, which can amount to thousands of alerts each day across several different platforms, is a massive undertaking. This can cause enterprises to miss major events, leading to potentially catastrophic outages.

Prior to AIOps, the only tools for dealing with the ineffective and noisy alerting that was clouding IT operations processes were blunt ones: remove non-essential alerting, raise thresholds to prevent false positives, and encode complex logic to handle the alerts, explains Charles Burnham, Director, AIOps Engineering at LogicMonitor. The effect of this sledgehammer approach however, was desensitized and incomplete monitoring that missed outages and hid preventative opportunities and with logic that was often too complex to maintain.

"One of the transformative aspects of AIOps is that, for the first time, an enterprise can cut through the noise that can overwhelm IT operations to identify meaningful signals embedded in monitoring data," says Reiss from Broadcom.

"AIOps can automatically categorize alerts based on their severity and importance, ensuring that only critical alerts are prioritized and addressed in a timely manner," adds Singh from Elastic.

Carlos Casanova, Principal Analyst at Forrester Research, explains further, "The AIOps tools can ingest vast amounts of disparate data and perform multivariate analysis at incredible speeds. AIOps quickly identifies volumes of alerts and if it recognizes that they're all traced back to a common point, it can collapse all of them into a drastically reduced number of items for the operator to deal with. This could easily be hundreds of items reduced to one or two. So, the work queues are significantly less and more important, less time is wasted sorting through the haystack for the needle."

Phillip Carter, Principal Product Manager at Honeycomb, elaborates, "AIOps reduces alert noise in several ways. More broadly speaking, ML models that can tell you what's worth paying attention to in your dumping ground of data will reduce the number of alerts if you were alerting on all of that data. They can also look at historical data on alert resolution and deprioritize alerts that are traditionally false positives."

Swita from New Relic adds, "Alerts age, and some age poorly. If that alert condition you wrote last year or even last month was based on some snapshotted 'norm,' who's to say it's still relevant or worthy of generating an alert today? However, a machine learning algorithm can adjust to the changing norm of a system and adjust alert conditions accordingly (assuming that's what you want), which can often reduce alert frequency and false positives."

"However, it's worth noting that just because an alert was a false positive before doesn't mean it will continue to be a false positive in the future, especially for services with irregular traffic patterns," cautions Carter from Honeycomb.

Go to: Discovering AIOps - Part 5, covering more advantages of AIOps.

Pete Goldin is Editor and Publisher of APMdigest

Hot Topics

The Latest

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 5 covers the infrastructure and hardware supporting AI ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 4 covers advancements in AI technology ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 3 covers AI's impact on employees and their roles ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 2 covers the challenges presented by AI, as well as solutions to those problems ...

In the final part of APMdigest's 2025 Predictions Series, industry experts offer predictions on how AI will evolve and impact technology and business in 2025 ...

E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...

Efficiency is a highly-desirable objective in business ... We're seeing this scenario play out in enterprises around the world as they continue to struggle with infrastructures and remote work models with an eye toward operational efficiencies. In contrast to that goal, a recent Broadcom survey of global IT and network professionals found widespread adoption of these strategies is making the network more complex and hampering observability, leading to uptime, performance and security issues. Let's look more closely at these challenges ...

Image
Broadcom

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Image
Catchpoint

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Image
Datastrike

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...