Skip to main content

The Case Against AIOps

Phillip Carter
Honeycomb

For the last couple weeks, APMdigest posted a series of blogs about AIOps that included my commentary. In this blog, I present the case against AIOps.

In theory, the ideas behind AIOps features are sound, but the machine learning (ML) systems involved aren't sophisticated enough to be effective or trustworthy.

AIOps is relatively mature today, at least in its current form. The ML models companies use for AIOps tasks work as well as they can, and the features that wrap them are fairly stable and mature. That being said, maturity might be a bit orthogonal to usefulness.

Despite being based on mature tech, AIOps features aren't widely used because they don't seem to often help with problems people have in practice. It's like if you were struggling with cooking a meal and the main challenge lies in mixing all the ingredients at the right time, but someone offered you a better way to chop the vegetables. Does chopping up vegetables more efficiently help? Maybe, but that doesn't solve the difficulty in timing your ingredients.

In addition, AIOps adoption is a big challenge for teams. Organizations may be constrained by their budget and cannot implement due to the feature's cost. AIOps often comes bundled with several other features, all with a high learning curve, and very few can work as a turnkey solution. It's yet another thing for busy teams to learn, which is not likely to be high on their priority list.

AIOps Does Not Provide Actionable Insights

AIOps arguably doesn't provide actionable insights. Sure, there are examples of teams reducing false positives and using anomaly detection to identify something worth investigating. Still, teams have been able to reduce false positives and identify uniquely interesting patterns in data long before AIOps, and typically do this today without AIOps features.

For example, you don't need ML models to tell you that a particular measure crosses a threshold. Furthermore, these models work only with past behavior as context. They can't predict future behavior, especially for services with irregular traffic patterns. And it's services with irregular traffic patterns that actually present the most problems (and thus time spent debugging) in the first place.

One use case that can be helpful in understanding this problem is analyzing a giant bucket of data that hasn't been organized. When organizations treat operations data as a dumping ground, using an ML model to perform pattern analysis and separate usable from unusable data can be helpful. However, it's only treating a symptom and not the root cause.

And when there are issues that AIOps features can't help identify, you're back to an extremely long time spent figuring out what's wrong in a system.

Facing Your Organizational Issues

The advantages of AIOps are insignificant because AIOps features primarily exist to patch organizational and technical failures. The long-term solution is to invest in your organization and empower your teams to pick quality tools, not be sold the flashy promises of a quick AI fix.

I wouldn't suggest users go looking for an AIOps-specific provider and should instead leverage their team's expertise. Regarding these specific use cases, humans are far better at making critical judgment calls than the ML models on the market today. Deciding what's worth looking at and alerting on is the best possible use of human time.

Most of the problems that AIOps purports to solve are organizational issues. Fix your organizational and technical issues by giving your teams the agency to fix things in the first place.

If you have problems with noise in your data, look at how you generate telemetry and prioritize working to improve it. Lead a culture shift by enforcing the principle that good telemetry is a concern for application developers, not just ops teams.

If your alerts are out of order, have your team look at what they're alerting on and make necessary adjustments. If you have noisy alerts, talk to the people who are getting alerted to discover and investigate why things are too noisy. Take on call engineers very seriously, constantly poll people, and ensure they're not burning out. Some vendors will try to sell you on ML models that will magically solve alert fatigue, but please know and take caution that there is no magic, and your problems won't get solved by ML models.

If your organization doesn't have development teams prioritizing good telemetry, incentivize them to care about it.

LLMs for Observability

Can you tell I'm not particularly bullish on AIOps? I am incredibly bullish on LLMs for Observability, though. LLMs do a great job of taking natural language inputs and producing things like queries on data, analyzing data relevant to a query, and generating things that can help to teach people how to use a product. We'll uncover more use cases but right now LLMs are best at actually reducing toil and lowering the bar to learning how to analyze your production data in the first place.

While I'm not too hopeful about the future of AIOps, I am optimistic about how AI will continue to integrate into operations. LLMs present novel ways for us to interact with systems that were previously impossible. For example, observability vendors are releasing AI features that lower the barrier for developers to access and make the most out of their observability tools. Innovations like this will continue to enhance developer workflows and transform the way we work for the better.

Phillip Carter is Principal Product Manager at Honeycomb

Hot Topics

The Latest

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

The Case Against AIOps

Phillip Carter
Honeycomb

For the last couple weeks, APMdigest posted a series of blogs about AIOps that included my commentary. In this blog, I present the case against AIOps.

In theory, the ideas behind AIOps features are sound, but the machine learning (ML) systems involved aren't sophisticated enough to be effective or trustworthy.

AIOps is relatively mature today, at least in its current form. The ML models companies use for AIOps tasks work as well as they can, and the features that wrap them are fairly stable and mature. That being said, maturity might be a bit orthogonal to usefulness.

Despite being based on mature tech, AIOps features aren't widely used because they don't seem to often help with problems people have in practice. It's like if you were struggling with cooking a meal and the main challenge lies in mixing all the ingredients at the right time, but someone offered you a better way to chop the vegetables. Does chopping up vegetables more efficiently help? Maybe, but that doesn't solve the difficulty in timing your ingredients.

In addition, AIOps adoption is a big challenge for teams. Organizations may be constrained by their budget and cannot implement due to the feature's cost. AIOps often comes bundled with several other features, all with a high learning curve, and very few can work as a turnkey solution. It's yet another thing for busy teams to learn, which is not likely to be high on their priority list.

AIOps Does Not Provide Actionable Insights

AIOps arguably doesn't provide actionable insights. Sure, there are examples of teams reducing false positives and using anomaly detection to identify something worth investigating. Still, teams have been able to reduce false positives and identify uniquely interesting patterns in data long before AIOps, and typically do this today without AIOps features.

For example, you don't need ML models to tell you that a particular measure crosses a threshold. Furthermore, these models work only with past behavior as context. They can't predict future behavior, especially for services with irregular traffic patterns. And it's services with irregular traffic patterns that actually present the most problems (and thus time spent debugging) in the first place.

One use case that can be helpful in understanding this problem is analyzing a giant bucket of data that hasn't been organized. When organizations treat operations data as a dumping ground, using an ML model to perform pattern analysis and separate usable from unusable data can be helpful. However, it's only treating a symptom and not the root cause.

And when there are issues that AIOps features can't help identify, you're back to an extremely long time spent figuring out what's wrong in a system.

Facing Your Organizational Issues

The advantages of AIOps are insignificant because AIOps features primarily exist to patch organizational and technical failures. The long-term solution is to invest in your organization and empower your teams to pick quality tools, not be sold the flashy promises of a quick AI fix.

I wouldn't suggest users go looking for an AIOps-specific provider and should instead leverage their team's expertise. Regarding these specific use cases, humans are far better at making critical judgment calls than the ML models on the market today. Deciding what's worth looking at and alerting on is the best possible use of human time.

Most of the problems that AIOps purports to solve are organizational issues. Fix your organizational and technical issues by giving your teams the agency to fix things in the first place.

If you have problems with noise in your data, look at how you generate telemetry and prioritize working to improve it. Lead a culture shift by enforcing the principle that good telemetry is a concern for application developers, not just ops teams.

If your alerts are out of order, have your team look at what they're alerting on and make necessary adjustments. If you have noisy alerts, talk to the people who are getting alerted to discover and investigate why things are too noisy. Take on call engineers very seriously, constantly poll people, and ensure they're not burning out. Some vendors will try to sell you on ML models that will magically solve alert fatigue, but please know and take caution that there is no magic, and your problems won't get solved by ML models.

If your organization doesn't have development teams prioritizing good telemetry, incentivize them to care about it.

LLMs for Observability

Can you tell I'm not particularly bullish on AIOps? I am incredibly bullish on LLMs for Observability, though. LLMs do a great job of taking natural language inputs and producing things like queries on data, analyzing data relevant to a query, and generating things that can help to teach people how to use a product. We'll uncover more use cases but right now LLMs are best at actually reducing toil and lowering the bar to learning how to analyze your production data in the first place.

While I'm not too hopeful about the future of AIOps, I am optimistic about how AI will continue to integrate into operations. LLMs present novel ways for us to interact with systems that were previously impossible. For example, observability vendors are releasing AI features that lower the barrier for developers to access and make the most out of their observability tools. Innovations like this will continue to enhance developer workflows and transform the way we work for the better.

Phillip Carter is Principal Product Manager at Honeycomb

Hot Topics

The Latest

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.