Skip to main content

Discovering AIOps - Part 4: Advantages

Pete Goldin
APMdigest

The reason AIOps has become well-known in recent years is because it offers a variety of very compelling advantages. Part 4 of this blog series covers the expert picks for the greatest advantages that can be gained from AIOps.

Start with: Discovering AIOps - Part 1

Start with: Discovering AIOps - Part 2: Must-Have Capabilities

Start with: Discovering AIOps - Part 3: The Users

Better IT Decision Making

One overarching advantage of AIOps is support for better IT decision-making, with regard to system, network and application performance — an advantage that impacts many of the other advantages on this list.

"A shared goal across IT teams is to gain insights leveraging all available monitoring data," explains Andreas Reiss, Head of Product Management, AIOps and Observability, at Broadcom. "AIOps helps correlate the data and enhances it with meaningful context that informs the enterprise what it doesn't know yet about its environment and possible problem scenarios that might affect the business and end users. These kinds of insights are the goal of AIOps."

AIOps equips organizations with agile insights into their IT ecosystem, according to Payal Kindiger, Senior Director of Product Marketing at Riverbed. By correlating data from various sources, it provides a comprehensive overview of system health, performance trends, and anomalies. This enables quicker and more informed decision-making, supporting agile responses to changing business needs and dynamic IT environments.

Gagan Singh, VP of Product Marketing, Observability, at Elastic, says, "AIOps provides actionable insights by using machine learning algorithms and natural language processing to analyze large volumes of telemetry data (logs, metrics, and traces) from several systems to identify patterns, correlate events, generate predictions about future events, and discover the root cause."

Handling Big Data

Companies today average more than 200 applications, many of which are outside of IT control. As a result, companies are drowning in data, much of which is siloed between departments. AIOps offers the ability to consume large, disparate data sets such as these and help bring order from chaos, says Thomas LaRock, Principal Developer Evangelist at Selector.

"AIOps will monitor one or more big data sources — let's say databases containing software and infrastructure telemetry like metrics, traces, logs, and events — and use machine learning algorithms to find deviations from 'normal' nearly in real time, essentially highlighting these deviations as they are happening or very shortly after," adds Camden Swita, Senior Product Manager at New Relic.

Identifying Anomalies

"AIOps platforms leverage advanced machine learning algorithms to analyze vast volumes of data from various sources. This enables AIOps platforms to identify patterns, anomalies, and correlations within this data. Such patterns and anomalies are usually undetectable by IT admins manually combing through data," says Bharani Kumar Kulasekaran, Product Manager at ManageEngine.

Pinpointing the Root Cause

"Recent data shows that organizations are especially seeing the positive influence of AIOps, with AIOps tools outperforming legacy solutions in a number of ways, such as automatically determining the technical root cause of an issue and better assessing the severity of an issue," says Spiros Xanthos, SVP and General Manager of Observability at Splunk.

Problem isolation is a great example where AIOps can make a big impact by acting to discover a root cause, such as by generating metrics along the graph of relationships between components on a transaction trace, adds Asaf Yigal, CTO of Logz.io.

Predicting Future Issues

AIOps platforms offer forecasting capabilities, which enable them to predict potential future issues based on existing data and trends. This helps organizations take a proactive approach to preventing downtime, Bharani Kumar Kulasekaran, Product Manager at ManageEngine explains.

Accelerating MTTR

Yigal from Logz.io explains, "If you can tell the analyst, this is mega important based on previous investigations in your environment or people have seen this thing a million times and marked it as low-priority, you are having a massive impact. It's really hard to drive down MTTR if you don't have the right contextual information to help with the investigation. However, it's even harder to drive down MTTR if your people are occupying their time focused on the wrong activities."

In the event of an issue, AIOps aids faster Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD), according to Ali Siddiqui, Chief Product Officer at BMC.

Faster Ticket Remediation

The use of AIOps capabilities in ITSM can speed up or automate ticket remediation, Yigal from Logz.io advises.

Kindiger from Riverbed adds that intelligent ticketing, where unified observability solutions leverage AIOps capabilities, can offer user-first contextualized insights directly within ITSM tickets. This eliminates the need for escalations to specialized resources while empowering L1 service desk agents to swiftly resolve issues with the required information at their fingertips. Additionally, combined with automation, tickets can be created, prioritized, routed and resolved before issues are raised as incidents.

Minimizing Alert Noise

Traditional monitoring tools rely on static thresholds to generate alerts. This coupled with the quest for monitoring everything results in a flood of alerts — false positives, FYI notifications, benign status changes, and duplicates, according to Monika Bhave, Product Manager at Digitate. Sifting through this noise, which can amount to thousands of alerts each day across several different platforms, is a massive undertaking. This can cause enterprises to miss major events, leading to potentially catastrophic outages.

Prior to AIOps, the only tools for dealing with the ineffective and noisy alerting that was clouding IT operations processes were blunt ones: remove non-essential alerting, raise thresholds to prevent false positives, and encode complex logic to handle the alerts, explains Charles Burnham, Director, AIOps Engineering at LogicMonitor. The effect of this sledgehammer approach however, was desensitized and incomplete monitoring that missed outages and hid preventative opportunities and with logic that was often too complex to maintain.

"One of the transformative aspects of AIOps is that, for the first time, an enterprise can cut through the noise that can overwhelm IT operations to identify meaningful signals embedded in monitoring data," says Reiss from Broadcom.

"AIOps can automatically categorize alerts based on their severity and importance, ensuring that only critical alerts are prioritized and addressed in a timely manner," adds Singh from Elastic.

Carlos Casanova, Principal Analyst at Forrester Research, explains further, "The AIOps tools can ingest vast amounts of disparate data and perform multivariate analysis at incredible speeds. AIOps quickly identifies volumes of alerts and if it recognizes that they're all traced back to a common point, it can collapse all of them into a drastically reduced number of items for the operator to deal with. This could easily be hundreds of items reduced to one or two. So, the work queues are significantly less and more important, less time is wasted sorting through the haystack for the needle."

Phillip Carter, Principal Product Manager at Honeycomb, elaborates, "AIOps reduces alert noise in several ways. More broadly speaking, ML models that can tell you what's worth paying attention to in your dumping ground of data will reduce the number of alerts if you were alerting on all of that data. They can also look at historical data on alert resolution and deprioritize alerts that are traditionally false positives."

Swita from New Relic adds, "Alerts age, and some age poorly. If that alert condition you wrote last year or even last month was based on some snapshotted 'norm,' who's to say it's still relevant or worthy of generating an alert today? However, a machine learning algorithm can adjust to the changing norm of a system and adjust alert conditions accordingly (assuming that's what you want), which can often reduce alert frequency and false positives."

"However, it's worth noting that just because an alert was a false positive before doesn't mean it will continue to be a false positive in the future, especially for services with irregular traffic patterns," cautions Carter from Honeycomb.

Go to: Discovering AIOps - Part 5, covering more advantages of AIOps.

Pete Goldin is Editor and Publisher of APMdigest

Hot Topics

The Latest

While 87% of manufacturing leaders and technical specialists report that ROI from their AIOps initiatives has met or exceeded expectations, only 37% say they are fully prepared to operationalize AI at scale, according to The Future of IT Operations in the AI Era, a report from Riverbed ...

Many organizations rely on cloud-first architectures to aggregate, analyze, and act on their operational data ... However, not all environments are conducive to cloud-first architectures ... There are limitations to cloud-first architectures that render them ineffective in mission-critical situations where responsiveness, cost control, and data sovereignty are non-negotiable; these limitations include ...

For years, cybersecurity was built around a simple assumption: protect the physical network and trust everything inside it. That model made sense when employees worked in offices, applications lived in data centers, and devices rarely left the building. Today's reality is fluid: people work from everywhere, applications run across multiple clouds, and AI-driven agents are beginning to act on behalf of users. But while the old perimeter dissolved, a new one quietly emerged ...

For years, infrastructure teams have treated compute as a relatively stable input. Capacity was provisioned, costs were forecasted, and performance expectations were set based on the assumption that identical resources behaved identically. That mental model is starting to break down. AI infrastructure is no longer behaving like static cloud capacity. It is increasingly behaving like a market ...

Resilience can no longer be defined by how quickly an organization recovers from an incident or disruption. The effectiveness of any resilience strategy is dependent on its ability to anticipate change, operate under continuous stress, and adapt confidently amid uncertainty ...

Mobile users are less tolerant of app instability than ever before. According to a new report from Luciq, No Margin for Error: What Mobile Users Expect and What Mobile Leaders Must Deliver in 2026, even minor performance issues now result in immediate abandonment, lost purchases, and long-term brand impact ...

Artificial intelligence (AI) has become the dominant force shaping enterprise data strategies. Boards expect progress. Executives expect returns. And data leaders are under pressure to prove that their organizations are "AI-ready" ...

Agentic AI is a major buzzword for 2026. Many tech companies are making bold promises about this technology, but many aren't grounded in reality, at least not yet. This coming year will likely be shaped by reality checks for IT teams, and progress will only come from a focus on strong foundations and disciplined execution ...

AI systems are still prone to hallucinations and misjudgments ... To build the trust needed for adoption, AI must be paired with human-in-the-loop (HITL) oversight, or checkpoints where humans verify, guide, and decide what actions are taken. The balance between autonomy and accountability is what will allow AI to deliver on its promise without sacrificing human trust ...

More data center leaders are reducing their reliance on utility grids by investing in onsite power for rapidly scaling data centers, according to the Data Center Power Report from Bloom Energy ...

Discovering AIOps - Part 4: Advantages

Pete Goldin
APMdigest

The reason AIOps has become well-known in recent years is because it offers a variety of very compelling advantages. Part 4 of this blog series covers the expert picks for the greatest advantages that can be gained from AIOps.

Start with: Discovering AIOps - Part 1

Start with: Discovering AIOps - Part 2: Must-Have Capabilities

Start with: Discovering AIOps - Part 3: The Users

Better IT Decision Making

One overarching advantage of AIOps is support for better IT decision-making, with regard to system, network and application performance — an advantage that impacts many of the other advantages on this list.

"A shared goal across IT teams is to gain insights leveraging all available monitoring data," explains Andreas Reiss, Head of Product Management, AIOps and Observability, at Broadcom. "AIOps helps correlate the data and enhances it with meaningful context that informs the enterprise what it doesn't know yet about its environment and possible problem scenarios that might affect the business and end users. These kinds of insights are the goal of AIOps."

AIOps equips organizations with agile insights into their IT ecosystem, according to Payal Kindiger, Senior Director of Product Marketing at Riverbed. By correlating data from various sources, it provides a comprehensive overview of system health, performance trends, and anomalies. This enables quicker and more informed decision-making, supporting agile responses to changing business needs and dynamic IT environments.

Gagan Singh, VP of Product Marketing, Observability, at Elastic, says, "AIOps provides actionable insights by using machine learning algorithms and natural language processing to analyze large volumes of telemetry data (logs, metrics, and traces) from several systems to identify patterns, correlate events, generate predictions about future events, and discover the root cause."

Handling Big Data

Companies today average more than 200 applications, many of which are outside of IT control. As a result, companies are drowning in data, much of which is siloed between departments. AIOps offers the ability to consume large, disparate data sets such as these and help bring order from chaos, says Thomas LaRock, Principal Developer Evangelist at Selector.

"AIOps will monitor one or more big data sources — let's say databases containing software and infrastructure telemetry like metrics, traces, logs, and events — and use machine learning algorithms to find deviations from 'normal' nearly in real time, essentially highlighting these deviations as they are happening or very shortly after," adds Camden Swita, Senior Product Manager at New Relic.

Identifying Anomalies

"AIOps platforms leverage advanced machine learning algorithms to analyze vast volumes of data from various sources. This enables AIOps platforms to identify patterns, anomalies, and correlations within this data. Such patterns and anomalies are usually undetectable by IT admins manually combing through data," says Bharani Kumar Kulasekaran, Product Manager at ManageEngine.

Pinpointing the Root Cause

"Recent data shows that organizations are especially seeing the positive influence of AIOps, with AIOps tools outperforming legacy solutions in a number of ways, such as automatically determining the technical root cause of an issue and better assessing the severity of an issue," says Spiros Xanthos, SVP and General Manager of Observability at Splunk.

Problem isolation is a great example where AIOps can make a big impact by acting to discover a root cause, such as by generating metrics along the graph of relationships between components on a transaction trace, adds Asaf Yigal, CTO of Logz.io.

Predicting Future Issues

AIOps platforms offer forecasting capabilities, which enable them to predict potential future issues based on existing data and trends. This helps organizations take a proactive approach to preventing downtime, Bharani Kumar Kulasekaran, Product Manager at ManageEngine explains.

Accelerating MTTR

Yigal from Logz.io explains, "If you can tell the analyst, this is mega important based on previous investigations in your environment or people have seen this thing a million times and marked it as low-priority, you are having a massive impact. It's really hard to drive down MTTR if you don't have the right contextual information to help with the investigation. However, it's even harder to drive down MTTR if your people are occupying their time focused on the wrong activities."

In the event of an issue, AIOps aids faster Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD), according to Ali Siddiqui, Chief Product Officer at BMC.

Faster Ticket Remediation

The use of AIOps capabilities in ITSM can speed up or automate ticket remediation, Yigal from Logz.io advises.

Kindiger from Riverbed adds that intelligent ticketing, where unified observability solutions leverage AIOps capabilities, can offer user-first contextualized insights directly within ITSM tickets. This eliminates the need for escalations to specialized resources while empowering L1 service desk agents to swiftly resolve issues with the required information at their fingertips. Additionally, combined with automation, tickets can be created, prioritized, routed and resolved before issues are raised as incidents.

Minimizing Alert Noise

Traditional monitoring tools rely on static thresholds to generate alerts. This coupled with the quest for monitoring everything results in a flood of alerts — false positives, FYI notifications, benign status changes, and duplicates, according to Monika Bhave, Product Manager at Digitate. Sifting through this noise, which can amount to thousands of alerts each day across several different platforms, is a massive undertaking. This can cause enterprises to miss major events, leading to potentially catastrophic outages.

Prior to AIOps, the only tools for dealing with the ineffective and noisy alerting that was clouding IT operations processes were blunt ones: remove non-essential alerting, raise thresholds to prevent false positives, and encode complex logic to handle the alerts, explains Charles Burnham, Director, AIOps Engineering at LogicMonitor. The effect of this sledgehammer approach however, was desensitized and incomplete monitoring that missed outages and hid preventative opportunities and with logic that was often too complex to maintain.

"One of the transformative aspects of AIOps is that, for the first time, an enterprise can cut through the noise that can overwhelm IT operations to identify meaningful signals embedded in monitoring data," says Reiss from Broadcom.

"AIOps can automatically categorize alerts based on their severity and importance, ensuring that only critical alerts are prioritized and addressed in a timely manner," adds Singh from Elastic.

Carlos Casanova, Principal Analyst at Forrester Research, explains further, "The AIOps tools can ingest vast amounts of disparate data and perform multivariate analysis at incredible speeds. AIOps quickly identifies volumes of alerts and if it recognizes that they're all traced back to a common point, it can collapse all of them into a drastically reduced number of items for the operator to deal with. This could easily be hundreds of items reduced to one or two. So, the work queues are significantly less and more important, less time is wasted sorting through the haystack for the needle."

Phillip Carter, Principal Product Manager at Honeycomb, elaborates, "AIOps reduces alert noise in several ways. More broadly speaking, ML models that can tell you what's worth paying attention to in your dumping ground of data will reduce the number of alerts if you were alerting on all of that data. They can also look at historical data on alert resolution and deprioritize alerts that are traditionally false positives."

Swita from New Relic adds, "Alerts age, and some age poorly. If that alert condition you wrote last year or even last month was based on some snapshotted 'norm,' who's to say it's still relevant or worthy of generating an alert today? However, a machine learning algorithm can adjust to the changing norm of a system and adjust alert conditions accordingly (assuming that's what you want), which can often reduce alert frequency and false positives."

"However, it's worth noting that just because an alert was a false positive before doesn't mean it will continue to be a false positive in the future, especially for services with irregular traffic patterns," cautions Carter from Honeycomb.

Go to: Discovering AIOps - Part 5, covering more advantages of AIOps.

Pete Goldin is Editor and Publisher of APMdigest

Hot Topics

The Latest

While 87% of manufacturing leaders and technical specialists report that ROI from their AIOps initiatives has met or exceeded expectations, only 37% say they are fully prepared to operationalize AI at scale, according to The Future of IT Operations in the AI Era, a report from Riverbed ...

Many organizations rely on cloud-first architectures to aggregate, analyze, and act on their operational data ... However, not all environments are conducive to cloud-first architectures ... There are limitations to cloud-first architectures that render them ineffective in mission-critical situations where responsiveness, cost control, and data sovereignty are non-negotiable; these limitations include ...

For years, cybersecurity was built around a simple assumption: protect the physical network and trust everything inside it. That model made sense when employees worked in offices, applications lived in data centers, and devices rarely left the building. Today's reality is fluid: people work from everywhere, applications run across multiple clouds, and AI-driven agents are beginning to act on behalf of users. But while the old perimeter dissolved, a new one quietly emerged ...

For years, infrastructure teams have treated compute as a relatively stable input. Capacity was provisioned, costs were forecasted, and performance expectations were set based on the assumption that identical resources behaved identically. That mental model is starting to break down. AI infrastructure is no longer behaving like static cloud capacity. It is increasingly behaving like a market ...

Resilience can no longer be defined by how quickly an organization recovers from an incident or disruption. The effectiveness of any resilience strategy is dependent on its ability to anticipate change, operate under continuous stress, and adapt confidently amid uncertainty ...

Mobile users are less tolerant of app instability than ever before. According to a new report from Luciq, No Margin for Error: What Mobile Users Expect and What Mobile Leaders Must Deliver in 2026, even minor performance issues now result in immediate abandonment, lost purchases, and long-term brand impact ...

Artificial intelligence (AI) has become the dominant force shaping enterprise data strategies. Boards expect progress. Executives expect returns. And data leaders are under pressure to prove that their organizations are "AI-ready" ...

Agentic AI is a major buzzword for 2026. Many tech companies are making bold promises about this technology, but many aren't grounded in reality, at least not yet. This coming year will likely be shaped by reality checks for IT teams, and progress will only come from a focus on strong foundations and disciplined execution ...

AI systems are still prone to hallucinations and misjudgments ... To build the trust needed for adoption, AI must be paired with human-in-the-loop (HITL) oversight, or checkpoints where humans verify, guide, and decide what actions are taken. The balance between autonomy and accountability is what will allow AI to deliver on its promise without sacrificing human trust ...

More data center leaders are reducing their reliance on utility grids by investing in onsite power for rapidly scaling data centers, according to the Data Center Power Report from Bloom Energy ...