Skip to main content

Why AIOps Projects Fail

Yoram Pollack
BigPanda

AIOps is still relatively new compared to existing technologies such as enterprise data warehouses, and early on many AIOps projects suffered hiccups, the aftereffects of which are still felt today. That's why, for some IT Ops teams and leaders, the prospect of transforming their IT operations using AIOps is a cause for concern.

But at the same time, AIOps has matured enough to a point where a critical mass of enterprises today — including some of the largest companies in the world — have successfully deployed it, and have learned valuable lessons along the way. Mainly, it's a matter of setting clear expectations and following several guidelines that help you avoid common pitfalls when setting out on an AIOps journey.

Set Your Expectations Straight

Yes, we all know the saying "Aim for the moon, and even if you miss you'll land among the stars."

But unfortunately this doesn't apply to AIOps adoption … As Gartner states in its recent Market Guide for AIOps Platforms — enterprises should "prioritize practical outcomes over aspirational goals by adopting an incremental approach…" when deploying AIOps platforms.

Biting off more than you can chew can delay your AIOps project — often by months or even years. Start small, where it hurts the most in your IT operations ecosystem, or what causes the most delays to your incident management lifecycle. Do so by integrating one tool at time, and testing one AIOps capability at a time. Once you are satisfied, you can incrementally add more tools to the AIOps platform, and then test more capabilities. In addition to making sure that your AIOps platform has proven itself before you begin to fully rely on it, this step-by-step approach also gives your team the chance to accumulate the skills and confidence they need, over time.

Additionally, remember that there is a tendency to think that AI behaves in a human-like manner. And so, it is often anthropomorphized and thought to have unrealistic "superhuman" capabilities. The reality is that AI in IT operations is algorithmic, and relies on alert ingestion, normalization and enrichment (or tagging), before correlation patterns can be generated, tested and refined. Which leads us to the next items on our list.

Make Sure You Can Integrate with All Your Existing Tools

Every enterprise uses and depends on several different tools that span different domains such as observability and monitoring, change, topology, collaboration and remediation. In almost all cases, these tools reflect years of investment, development and customization. Often these tools are deeply embedded into critical IT operations workflows and processes — and reflect the organization's tribal knowledge.

Your chosen AIOps platform needs to be able to integrate with these tools and ingest their data. Otherwise, vital information and key capabilities needed for the AI to work properly will be missing. And that's besides the fact that a long and painful long rip-and-replace project can easily derail a project just by the sheer amount of effort and long time to value.

You Need to Be Able to Adequately Prepare and Cleanse Your Data

"Garbage in, garbage out" is a well-known maxim in IT, and it applies to IT operations as well. As we just mentioned, it's critical to ingest all the alerts from all your tools. But it's not enough. Event normalization, enrichment and tagging (aka data preparation and cleansing) also have an outsized impact on the success of AIOps solutions.

Why? Because AIOps tools have to correlate the hundreds of thousands of ingested alerts into a small number of high-quality, actionable incidents. The ability of AI/ML to detect correlation patterns and "compress" alerts relies heavily on the quality of the data fed to it. Context-less data leads to limited, low-quality incidents as a result of weak correlation.

In a similar fashion, successful root cause analysis relies on the ability to understand and leverage the different dependencies between infrastructure and application components in modern environments. Some of this information is buried in incoming alert streams, and some of this information is contained in external data sources such as asset and inventory management systems, orchestration tools, APM service or flow maps, CMDBs and more.

Additionally — you need to be able to match incidents to problem changes (aka root cause changes) that are causing incidents and outages, and this information resides in a variety of tools such as CI/CD, Change Management, and more.

Your AIOps platform must deliver built-in normalization, enrichment and tagging that can add all this much-needed context at scale, and be able to process millions of IT alerts every day.

Your AI/ML Need to Be Explainable

Good data going into your AIOps platform will get you good results, and successfully leveraging your existing tribal knowledge to train and configure the AI will definitely benefit you. But, you also have to be able to see, understand and edit the correlation logic as the AI/ML trains itself. Unfortunately, some solutions still obscure it and do not provide adequate control and testability. This is one of the most common causes of AIOps failure.

Google spam filters are a good analogy. Google provides a baseline configuration that's very sophisticated at detecting spam. But it does give you the choice of classifying something as spam on your own, or removing the spam tag from a wrongly detected email. It provides an explanation of its decision, and then learns from your intervention moving forward.

The same is true for AI/ML in IT Ops. Your teams have to trust the results your AIOps tool is producing, and that trust comes from explainability. They need to understand why the AI correlated certain alerts together, and they must then have the ability to either accept or change the correlation pattern so it produces the desired result. Remember, you can have the best AI in the world, but if your teams don't understand why it's grouping certain alerts together (and why it's not grouping others) they are going to always be suspicious of the results even when they are correct, and eventually avoid using the ML.

Your AIOps Needs to Be Democratized

Today's enterprises are heterogeneous. Some have large, centralized IT Ops and NOC teams, whereas others have dozens or even hundreds of distributed DevOps and SRE teams. Some have "grown up" in the cloud, while others are mid-way or even just getting started with their modernization initiatives. In each of these enterprises there are many important stakeholders that can benefit from AIOps: from NOC Managers and L1 users to VPs of IT Ops to service owners to the heads of BUs and CIOs.

AIOps platforms must be accessible and be able to present their data, views and dashboards to every persona in your organization, no matter which type of enterprise you belong to. Additionally, the platform cannot be reliant on data scientists, configuration cannot depend on 3rd party consultants and product experts, and the admin overhead needs to be minimal.

Only then, can you realize the full potential of your AIOps investment.

Yoram Pollack is Director of Product Marketing at BigPanda

Hot Topics

The Latest

Payment system failures are putting $44.4 billion in US retail and hospitality sales at risk each year, underscoring how quickly disruption can derail day-to-day trading, according to research conducted by Dynatrace ... The findings show that payment failures are no longer isolated incidents, but part of a recurring operational challenge that disrupts service, damages customer trust, and negatively impacts revenue ...

For years, the success of DevOps has been measured by how much manual work teams can automate ... I believe that in 2026, the definition of DevOps success is going to expand significantly. The era of automation is giving way to the era of intelligent delivery, in which AI doesn't just accelerate pipelines, it understands them. With open observability connecting signals end-to-end across those tools, teams can build closed-loop systems that don't just move faster, but learn, adapt, and take action autonomously with confidence ...

The conversation around AI in the enterprise has officially shifted from "if" to "how fast." But according to the State of Network Operations 2026 report from Broadcom, most organizations are unknowingly building their AI strategies on sand. The data is clear: CIOs and network teams are putting the cart before the horse. AI cannot improve what the network cannot see, predict issues without historical context, automate processes that aren't standardized, or recommend fixes when the underlying telemetry is incomplete. If AI is the brain, then network observability is the nervous system that makes intelligent action possible ...

SolarWinds data shows that one in three DBAs are contemplating leaving their positions — a striking indicator of workforce pressure in this role. This is likely due to the technical and interpersonal frustrations plaguing today's DBAs. Hybrid IT environments provide widespread organizational benefits but also present growing complexity. Simultaneously, AI presents a paradox of benefits and pain points ...

Over the last year, we've seen enterprises stop treating AI as “special projects.” It is no longer confined to pilots or side experiments. AI is now embedded in production, shaping decisions, powering new business models, and changing how employees and customers experience work every day. So, the debate of "should we adopt AI" is settled. The real question is how quickly and how deeply it can be applied ...

In MEAN TIME TO INSIGHT Episode 20, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA presents his 2026 NetOps predictions ... 

Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...

My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...

APMdigest's Predictions Series continues with 2026 Data Center Predictions — industry experts offer predictions on how data centers will evolve and impact business in 2026 ...

APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026. Part 2 covers data and data platforms ...

Why AIOps Projects Fail

Yoram Pollack
BigPanda

AIOps is still relatively new compared to existing technologies such as enterprise data warehouses, and early on many AIOps projects suffered hiccups, the aftereffects of which are still felt today. That's why, for some IT Ops teams and leaders, the prospect of transforming their IT operations using AIOps is a cause for concern.

But at the same time, AIOps has matured enough to a point where a critical mass of enterprises today — including some of the largest companies in the world — have successfully deployed it, and have learned valuable lessons along the way. Mainly, it's a matter of setting clear expectations and following several guidelines that help you avoid common pitfalls when setting out on an AIOps journey.

Set Your Expectations Straight

Yes, we all know the saying "Aim for the moon, and even if you miss you'll land among the stars."

But unfortunately this doesn't apply to AIOps adoption … As Gartner states in its recent Market Guide for AIOps Platforms — enterprises should "prioritize practical outcomes over aspirational goals by adopting an incremental approach…" when deploying AIOps platforms.

Biting off more than you can chew can delay your AIOps project — often by months or even years. Start small, where it hurts the most in your IT operations ecosystem, or what causes the most delays to your incident management lifecycle. Do so by integrating one tool at time, and testing one AIOps capability at a time. Once you are satisfied, you can incrementally add more tools to the AIOps platform, and then test more capabilities. In addition to making sure that your AIOps platform has proven itself before you begin to fully rely on it, this step-by-step approach also gives your team the chance to accumulate the skills and confidence they need, over time.

Additionally, remember that there is a tendency to think that AI behaves in a human-like manner. And so, it is often anthropomorphized and thought to have unrealistic "superhuman" capabilities. The reality is that AI in IT operations is algorithmic, and relies on alert ingestion, normalization and enrichment (or tagging), before correlation patterns can be generated, tested and refined. Which leads us to the next items on our list.

Make Sure You Can Integrate with All Your Existing Tools

Every enterprise uses and depends on several different tools that span different domains such as observability and monitoring, change, topology, collaboration and remediation. In almost all cases, these tools reflect years of investment, development and customization. Often these tools are deeply embedded into critical IT operations workflows and processes — and reflect the organization's tribal knowledge.

Your chosen AIOps platform needs to be able to integrate with these tools and ingest their data. Otherwise, vital information and key capabilities needed for the AI to work properly will be missing. And that's besides the fact that a long and painful long rip-and-replace project can easily derail a project just by the sheer amount of effort and long time to value.

You Need to Be Able to Adequately Prepare and Cleanse Your Data

"Garbage in, garbage out" is a well-known maxim in IT, and it applies to IT operations as well. As we just mentioned, it's critical to ingest all the alerts from all your tools. But it's not enough. Event normalization, enrichment and tagging (aka data preparation and cleansing) also have an outsized impact on the success of AIOps solutions.

Why? Because AIOps tools have to correlate the hundreds of thousands of ingested alerts into a small number of high-quality, actionable incidents. The ability of AI/ML to detect correlation patterns and "compress" alerts relies heavily on the quality of the data fed to it. Context-less data leads to limited, low-quality incidents as a result of weak correlation.

In a similar fashion, successful root cause analysis relies on the ability to understand and leverage the different dependencies between infrastructure and application components in modern environments. Some of this information is buried in incoming alert streams, and some of this information is contained in external data sources such as asset and inventory management systems, orchestration tools, APM service or flow maps, CMDBs and more.

Additionally — you need to be able to match incidents to problem changes (aka root cause changes) that are causing incidents and outages, and this information resides in a variety of tools such as CI/CD, Change Management, and more.

Your AIOps platform must deliver built-in normalization, enrichment and tagging that can add all this much-needed context at scale, and be able to process millions of IT alerts every day.

Your AI/ML Need to Be Explainable

Good data going into your AIOps platform will get you good results, and successfully leveraging your existing tribal knowledge to train and configure the AI will definitely benefit you. But, you also have to be able to see, understand and edit the correlation logic as the AI/ML trains itself. Unfortunately, some solutions still obscure it and do not provide adequate control and testability. This is one of the most common causes of AIOps failure.

Google spam filters are a good analogy. Google provides a baseline configuration that's very sophisticated at detecting spam. But it does give you the choice of classifying something as spam on your own, or removing the spam tag from a wrongly detected email. It provides an explanation of its decision, and then learns from your intervention moving forward.

The same is true for AI/ML in IT Ops. Your teams have to trust the results your AIOps tool is producing, and that trust comes from explainability. They need to understand why the AI correlated certain alerts together, and they must then have the ability to either accept or change the correlation pattern so it produces the desired result. Remember, you can have the best AI in the world, but if your teams don't understand why it's grouping certain alerts together (and why it's not grouping others) they are going to always be suspicious of the results even when they are correct, and eventually avoid using the ML.

Your AIOps Needs to Be Democratized

Today's enterprises are heterogeneous. Some have large, centralized IT Ops and NOC teams, whereas others have dozens or even hundreds of distributed DevOps and SRE teams. Some have "grown up" in the cloud, while others are mid-way or even just getting started with their modernization initiatives. In each of these enterprises there are many important stakeholders that can benefit from AIOps: from NOC Managers and L1 users to VPs of IT Ops to service owners to the heads of BUs and CIOs.

AIOps platforms must be accessible and be able to present their data, views and dashboards to every persona in your organization, no matter which type of enterprise you belong to. Additionally, the platform cannot be reliant on data scientists, configuration cannot depend on 3rd party consultants and product experts, and the admin overhead needs to be minimal.

Only then, can you realize the full potential of your AIOps investment.

Yoram Pollack is Director of Product Marketing at BigPanda

Hot Topics

The Latest

Payment system failures are putting $44.4 billion in US retail and hospitality sales at risk each year, underscoring how quickly disruption can derail day-to-day trading, according to research conducted by Dynatrace ... The findings show that payment failures are no longer isolated incidents, but part of a recurring operational challenge that disrupts service, damages customer trust, and negatively impacts revenue ...

For years, the success of DevOps has been measured by how much manual work teams can automate ... I believe that in 2026, the definition of DevOps success is going to expand significantly. The era of automation is giving way to the era of intelligent delivery, in which AI doesn't just accelerate pipelines, it understands them. With open observability connecting signals end-to-end across those tools, teams can build closed-loop systems that don't just move faster, but learn, adapt, and take action autonomously with confidence ...

The conversation around AI in the enterprise has officially shifted from "if" to "how fast." But according to the State of Network Operations 2026 report from Broadcom, most organizations are unknowingly building their AI strategies on sand. The data is clear: CIOs and network teams are putting the cart before the horse. AI cannot improve what the network cannot see, predict issues without historical context, automate processes that aren't standardized, or recommend fixes when the underlying telemetry is incomplete. If AI is the brain, then network observability is the nervous system that makes intelligent action possible ...

SolarWinds data shows that one in three DBAs are contemplating leaving their positions — a striking indicator of workforce pressure in this role. This is likely due to the technical and interpersonal frustrations plaguing today's DBAs. Hybrid IT environments provide widespread organizational benefits but also present growing complexity. Simultaneously, AI presents a paradox of benefits and pain points ...

Over the last year, we've seen enterprises stop treating AI as “special projects.” It is no longer confined to pilots or side experiments. AI is now embedded in production, shaping decisions, powering new business models, and changing how employees and customers experience work every day. So, the debate of "should we adopt AI" is settled. The real question is how quickly and how deeply it can be applied ...

In MEAN TIME TO INSIGHT Episode 20, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA presents his 2026 NetOps predictions ... 

Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...

My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...

APMdigest's Predictions Series continues with 2026 Data Center Predictions — industry experts offer predictions on how data centers will evolve and impact business in 2026 ...

APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026. Part 2 covers data and data platforms ...