Skip to main content

The Case Against AIOps

Phillip Carter
Honeycomb

For the last couple weeks, APMdigest posted a series of blogs about AIOps that included my commentary. In this blog, I present the case against AIOps.

In theory, the ideas behind AIOps features are sound, but the machine learning (ML) systems involved aren't sophisticated enough to be effective or trustworthy.

AIOps is relatively mature today, at least in its current form. The ML models companies use for AIOps tasks work as well as they can, and the features that wrap them are fairly stable and mature. That being said, maturity might be a bit orthogonal to usefulness.

Despite being based on mature tech, AIOps features aren't widely used because they don't seem to often help with problems people have in practice. It's like if you were struggling with cooking a meal and the main challenge lies in mixing all the ingredients at the right time, but someone offered you a better way to chop the vegetables. Does chopping up vegetables more efficiently help? Maybe, but that doesn't solve the difficulty in timing your ingredients.

In addition, AIOps adoption is a big challenge for teams. Organizations may be constrained by their budget and cannot implement due to the feature's cost. AIOps often comes bundled with several other features, all with a high learning curve, and very few can work as a turnkey solution. It's yet another thing for busy teams to learn, which is not likely to be high on their priority list.

AIOps Does Not Provide Actionable Insights

AIOps arguably doesn't provide actionable insights. Sure, there are examples of teams reducing false positives and using anomaly detection to identify something worth investigating. Still, teams have been able to reduce false positives and identify uniquely interesting patterns in data long before AIOps, and typically do this today without AIOps features.

For example, you don't need ML models to tell you that a particular measure crosses a threshold. Furthermore, these models work only with past behavior as context. They can't predict future behavior, especially for services with irregular traffic patterns. And it's services with irregular traffic patterns that actually present the most problems (and thus time spent debugging) in the first place.

One use case that can be helpful in understanding this problem is analyzing a giant bucket of data that hasn't been organized. When organizations treat operations data as a dumping ground, using an ML model to perform pattern analysis and separate usable from unusable data can be helpful. However, it's only treating a symptom and not the root cause.

And when there are issues that AIOps features can't help identify, you're back to an extremely long time spent figuring out what's wrong in a system.

Facing Your Organizational Issues

The advantages of AIOps are insignificant because AIOps features primarily exist to patch organizational and technical failures. The long-term solution is to invest in your organization and empower your teams to pick quality tools, not be sold the flashy promises of a quick AI fix.

I wouldn't suggest users go looking for an AIOps-specific provider and should instead leverage their team's expertise. Regarding these specific use cases, humans are far better at making critical judgment calls than the ML models on the market today. Deciding what's worth looking at and alerting on is the best possible use of human time.

Most of the problems that AIOps purports to solve are organizational issues. Fix your organizational and technical issues by giving your teams the agency to fix things in the first place.

If you have problems with noise in your data, look at how you generate telemetry and prioritize working to improve it. Lead a culture shift by enforcing the principle that good telemetry is a concern for application developers, not just ops teams.

If your alerts are out of order, have your team look at what they're alerting on and make necessary adjustments. If you have noisy alerts, talk to the people who are getting alerted to discover and investigate why things are too noisy. Take on call engineers very seriously, constantly poll people, and ensure they're not burning out. Some vendors will try to sell you on ML models that will magically solve alert fatigue, but please know and take caution that there is no magic, and your problems won't get solved by ML models.

If your organization doesn't have development teams prioritizing good telemetry, incentivize them to care about it.

LLMs for Observability

Can you tell I'm not particularly bullish on AIOps? I am incredibly bullish on LLMs for Observability, though. LLMs do a great job of taking natural language inputs and producing things like queries on data, analyzing data relevant to a query, and generating things that can help to teach people how to use a product. We'll uncover more use cases but right now LLMs are best at actually reducing toil and lowering the bar to learning how to analyze your production data in the first place.

While I'm not too hopeful about the future of AIOps, I am optimistic about how AI will continue to integrate into operations. LLMs present novel ways for us to interact with systems that were previously impossible. For example, observability vendors are releasing AI features that lower the barrier for developers to access and make the most out of their observability tools. Innovations like this will continue to enhance developer workflows and transform the way we work for the better.

Phillip Carter is Principal Product Manager at Honeycomb

Hot Topics

The Latest

Most organizations approach OpenTelemetry as a collection of individual tools they need to assemble from scratch. This view misses the bigger picture. OpenTelemetry is a complete telemetry framework with composable components that address specific problems at different stages of organizational maturity. You start with what you need today and adopt additional pieces as your observability practices evolve ...

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

As discussions around AI "autonomous coworkers" accelerate, many industry projections assume that agents will soon operate alongside human staff in making decisions, taking actions, and managing tasks with minimal oversight. But a growing number of critics (including some of the developers building these systems) argue that the industry still has a long way to go to be able to treat AI agents like fully trusted teammates ...

Enterprise AI has entered a transformational phase where, according to Digitate's recently released survey, Agentic AI and the Future of Enterprise IT, companies are moving beyond traditional automation toward Agentic AI systems designed to reason, adapt, and collaborate alongside human teams ...

The numbers back this urgency up. A recent Zapier survey shows that 92% of enterprises now treat AI as a top priority. Leaders want it, and teams are clamoring for it. But if you look closer at the operations of these companies, you see a different picture. The rollout is slow. The results are often delayed. There's a disconnect between what leaders want and what their technical infrastructure can handle ...

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Basic uptime is no longer the gold standard. By 2026, network monitoring must do more than report status, it must explain performance in a hybrid-first world. Networks are no longer just static support systems; they are agile, distributed architectures that sit at the very heart of the customer experience and the business outcomes ... The following five trends represent the new standard for network health, providing a blueprint for teams to move from reactive troubleshooting to a proactive, integrated future ...

APMdigest's Predictions Series concludes with 2026 AI Predictions — industry experts offer predictions on how AI and related technologies will evolve and impact business in 2026. Part 5, the final installment, covers AI's impacts on IT teams ...

The Case Against AIOps

Phillip Carter
Honeycomb

For the last couple weeks, APMdigest posted a series of blogs about AIOps that included my commentary. In this blog, I present the case against AIOps.

In theory, the ideas behind AIOps features are sound, but the machine learning (ML) systems involved aren't sophisticated enough to be effective or trustworthy.

AIOps is relatively mature today, at least in its current form. The ML models companies use for AIOps tasks work as well as they can, and the features that wrap them are fairly stable and mature. That being said, maturity might be a bit orthogonal to usefulness.

Despite being based on mature tech, AIOps features aren't widely used because they don't seem to often help with problems people have in practice. It's like if you were struggling with cooking a meal and the main challenge lies in mixing all the ingredients at the right time, but someone offered you a better way to chop the vegetables. Does chopping up vegetables more efficiently help? Maybe, but that doesn't solve the difficulty in timing your ingredients.

In addition, AIOps adoption is a big challenge for teams. Organizations may be constrained by their budget and cannot implement due to the feature's cost. AIOps often comes bundled with several other features, all with a high learning curve, and very few can work as a turnkey solution. It's yet another thing for busy teams to learn, which is not likely to be high on their priority list.

AIOps Does Not Provide Actionable Insights

AIOps arguably doesn't provide actionable insights. Sure, there are examples of teams reducing false positives and using anomaly detection to identify something worth investigating. Still, teams have been able to reduce false positives and identify uniquely interesting patterns in data long before AIOps, and typically do this today without AIOps features.

For example, you don't need ML models to tell you that a particular measure crosses a threshold. Furthermore, these models work only with past behavior as context. They can't predict future behavior, especially for services with irregular traffic patterns. And it's services with irregular traffic patterns that actually present the most problems (and thus time spent debugging) in the first place.

One use case that can be helpful in understanding this problem is analyzing a giant bucket of data that hasn't been organized. When organizations treat operations data as a dumping ground, using an ML model to perform pattern analysis and separate usable from unusable data can be helpful. However, it's only treating a symptom and not the root cause.

And when there are issues that AIOps features can't help identify, you're back to an extremely long time spent figuring out what's wrong in a system.

Facing Your Organizational Issues

The advantages of AIOps are insignificant because AIOps features primarily exist to patch organizational and technical failures. The long-term solution is to invest in your organization and empower your teams to pick quality tools, not be sold the flashy promises of a quick AI fix.

I wouldn't suggest users go looking for an AIOps-specific provider and should instead leverage their team's expertise. Regarding these specific use cases, humans are far better at making critical judgment calls than the ML models on the market today. Deciding what's worth looking at and alerting on is the best possible use of human time.

Most of the problems that AIOps purports to solve are organizational issues. Fix your organizational and technical issues by giving your teams the agency to fix things in the first place.

If you have problems with noise in your data, look at how you generate telemetry and prioritize working to improve it. Lead a culture shift by enforcing the principle that good telemetry is a concern for application developers, not just ops teams.

If your alerts are out of order, have your team look at what they're alerting on and make necessary adjustments. If you have noisy alerts, talk to the people who are getting alerted to discover and investigate why things are too noisy. Take on call engineers very seriously, constantly poll people, and ensure they're not burning out. Some vendors will try to sell you on ML models that will magically solve alert fatigue, but please know and take caution that there is no magic, and your problems won't get solved by ML models.

If your organization doesn't have development teams prioritizing good telemetry, incentivize them to care about it.

LLMs for Observability

Can you tell I'm not particularly bullish on AIOps? I am incredibly bullish on LLMs for Observability, though. LLMs do a great job of taking natural language inputs and producing things like queries on data, analyzing data relevant to a query, and generating things that can help to teach people how to use a product. We'll uncover more use cases but right now LLMs are best at actually reducing toil and lowering the bar to learning how to analyze your production data in the first place.

While I'm not too hopeful about the future of AIOps, I am optimistic about how AI will continue to integrate into operations. LLMs present novel ways for us to interact with systems that were previously impossible. For example, observability vendors are releasing AI features that lower the barrier for developers to access and make the most out of their observability tools. Innovations like this will continue to enhance developer workflows and transform the way we work for the better.

Phillip Carter is Principal Product Manager at Honeycomb

Hot Topics

The Latest

Most organizations approach OpenTelemetry as a collection of individual tools they need to assemble from scratch. This view misses the bigger picture. OpenTelemetry is a complete telemetry framework with composable components that address specific problems at different stages of organizational maturity. You start with what you need today and adopt additional pieces as your observability practices evolve ...

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

As discussions around AI "autonomous coworkers" accelerate, many industry projections assume that agents will soon operate alongside human staff in making decisions, taking actions, and managing tasks with minimal oversight. But a growing number of critics (including some of the developers building these systems) argue that the industry still has a long way to go to be able to treat AI agents like fully trusted teammates ...

Enterprise AI has entered a transformational phase where, according to Digitate's recently released survey, Agentic AI and the Future of Enterprise IT, companies are moving beyond traditional automation toward Agentic AI systems designed to reason, adapt, and collaborate alongside human teams ...

The numbers back this urgency up. A recent Zapier survey shows that 92% of enterprises now treat AI as a top priority. Leaders want it, and teams are clamoring for it. But if you look closer at the operations of these companies, you see a different picture. The rollout is slow. The results are often delayed. There's a disconnect between what leaders want and what their technical infrastructure can handle ...

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Basic uptime is no longer the gold standard. By 2026, network monitoring must do more than report status, it must explain performance in a hybrid-first world. Networks are no longer just static support systems; they are agile, distributed architectures that sit at the very heart of the customer experience and the business outcomes ... The following five trends represent the new standard for network health, providing a blueprint for teams to move from reactive troubleshooting to a proactive, integrated future ...

APMdigest's Predictions Series concludes with 2026 AI Predictions — industry experts offer predictions on how AI and related technologies will evolve and impact business in 2026. Part 5, the final installment, covers AI's impacts on IT teams ...