Skip to main content

APM and Observability: Cutting Through the Confusion — Part 12

Pete Goldin
APMdigest

In Part 12, the final installment in the series, the experts present some final predictions about AI's future impact on APM and Observability.

Start with: APM and Observability - Cutting Through the Confusion - Part 11

AI-powered capabilities such as AI Assistants, zero-config ML-based multi-signal correlation, pattern analysis, failure detection, latency analysis, and more are enriching the APM experience and tightly integrating it with other observability signals, according to Bahubali Shetti, Senior Director, Product Marketing, Elastic. Users can solve problems holistically using all available signals and data, rather than relying on metrics, logs, or traces in isolation.

The integration of AI and machine learning will deepen, enabling faster, more accurate diagnostics and increasingly automated remediation, says Arun Balachandran, Senior Product Marketing Manager, ManageEngine APM Solutions.

"AI will become central, automating anomaly detection, root cause analysis, and performance optimization," adds Varma Kunaparaju, SVP and GM for Cloud Platform and OpsRamp Software, HPE, "making both APM and observability more proactive and predictive. This transformation will enable more agile and resilient IT operations, driving innovation and competitive advantage."

The following are more predictions from the experts:

ASSISTIVE OBSERVABILITY

Observability will move from being reactive to being assistive. As systems grow more complex, organizations will need observability platforms that don't just show what happened, but help explain why. That requires open, high-fidelity data, which is why the CNCF ecosystem is so critical. Projects like Thanos https://thanos.io/ (for scalable metrics), Fluent Bit (for log routing), and OpenTelemetry (for structured, correlated telemetry) are laying the foundation for AI-enhanced, team-centric observability that adapts as fast as the systems it observes.
Brian Douglas
Head of Ecosystem, Cloud Native Computing Foundation (CNCF)

AGENTIC WORKFLOWS AND CONVERSATIONAL EXPERIENCES

Agentic workflows and conversational experiences will completely change IT operations workflows, making it much more practical to find and resolve issues all through conversational experiences including code generation, patching, and deployment.
Bill Lobig
VP of Observability, IBM Automation

SMART DATA PIPELINES

Data pipelines will become smarter — filtering at the edge, routing to multiple destinations, and using AI to recommend what matters. 
Gurjeet Arora
CEO and Co-Founder, Observo AI

The implementation of AI engines is going to dramatically impact the scope and capabilities of APM solutions in a positive way. We will see the rise of tightly correlated data elements that are automatically traced, identified, and presented to IT Operations staff in real time, with targeted guidance on what they should be doing next to support the health of the application. Notably, this will be to a degree that will seem almost magical compared to current solutions. We're just seeing the very beginnings of it already, but I believe that vastly more data will be ingested and understood in real time, leading to what would be interpreted today as a near-perfect understanding of application state.
Bryan Cole
Director of Customer Engineering, Tricentis

OPENTELEMETRY

AI will clearly play a significant role in the future of observability, although it's unclear exactly which use cases will dominate. Some vendors today seem excited about the prospect of AI helping developers manage the overwhelming volume of data that comes from disparate logging, metrics, and APM tools. However, this data volume challenge isn't inherent to building systems — it's a consequence of emitting data in formats designed for previous generations of tooling. I hope that as tool makers bring AI into the observability landscape, they focus more on how AI can help us swiftly move into the OpenTelemetry future, e.g. by speeding the authoring and adoption of custom instrumentation, instead of providing an "intelligent" layer on top of a hodgepodge of existing logging, monitoring, and APM tooling. The higher up in the observability "funnel" we can deploy AI, the more powerful the results will be for our development teams.
Emily Nakashima
VP of Engineering, Honeycomb

SELF-LEARNING AI

In today's increasingly complex environments, visibility alone isn't enough. The next wave of AIOps solutions is being driven by self-learning AI platforms that unify and interpret data across operational domains, transforming it into predictive, prioritized, and actionable insights — without relying on static topologies or predefined rules. AIOps platforms built on a fully AI-native architecture are shifting the focus from simply monitoring systems to enabling intelligent, autonomous operations.

By applying predictive, causal, and generative AI, these platforms not only enhance the value of existing tools but increasingly have the potential to replace standalone observability solutions. They offer a single, intelligent layer that surfaces emerging issues, pinpoints root causes, and drives automated resolution — enabling a shift from fragmented monitoring to proactive, autonomous operations. Self-learning AI will ultimately replace traditional observability platforms by becoming the integrated, real-time source of operational truth. Rather than relying on topology and rules-based AIOps and Observability platforms, the next generation of platforms will continuously learn from live telemetry, historical incidents, human actions, and system behavior to proactively detect, diagnose, and even remediate issues. This real-time learning loop will reduce noise, surface meaningful patterns, and guide teams toward faster, more confident decisions — paving the way for predictive, autonomous, and eventually self-healing IT environments.
Josh Kindiger
President, Grokstream

SELF-HEALING SYSTEMS

It's likely we'll see more widely adopted "self-healing" application capabilities through the use of AI and the observability data that feeds it.
Justin Collier
Senior Director of Product Management, SmartBear

The next few years will be less about manually building SLAs, dashboards, and alerts and more centered on self-healing and adaptive systems. With the rise of AI and ML embedded into observability platforms, we'll see a shift toward systems that can detect anomalies, determine probable root causes, and even take corrective actions without (some) human intervention. The result is not just greater efficiency but a fundamentally more resilient and intelligent digital infrastructure.
Mimi Shalash
Observability Advisor at Splunk, a Cisco Company

We're heading toward Autonomous Service Reliability: systems that not only observe themselves, but also understand, diagnose, and even self-heal with minimal human intervention.
Severin Neumann
Head of Community & Developer Relations, Causely

AI WILL NOT REPLACE HUMANS

AI's role in APM and observability will be to assist and guide, not replace, human expertise. AI will be used to narrow down search spaces, prioritize issues, and support human operators in diagnosis rather than providing fully automated solutions.
Jeff Cobb
Global Head of Product & Design, Chronosphere

AI OBSERVABILITY

Expect to see a rise in the need for AI-specific observability and application performance monitoring. Customers embedding GenAI into their own offerings will need to monitor these AI applications and AI factories, driving new requirements for observability platforms that can handle these specialized workloads.
Paul Appleby
CEO, Virtana

Pete Goldin is Editor and Publisher of APMdigest

The Latest

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

The quietest week your engineering team has ever had might also be its best. No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?" ... I've spent many years in engineering, and this pattern keeps repeating ...

APM and Observability: Cutting Through the Confusion — Part 12

Pete Goldin
APMdigest

In Part 12, the final installment in the series, the experts present some final predictions about AI's future impact on APM and Observability.

Start with: APM and Observability - Cutting Through the Confusion - Part 11

AI-powered capabilities such as AI Assistants, zero-config ML-based multi-signal correlation, pattern analysis, failure detection, latency analysis, and more are enriching the APM experience and tightly integrating it with other observability signals, according to Bahubali Shetti, Senior Director, Product Marketing, Elastic. Users can solve problems holistically using all available signals and data, rather than relying on metrics, logs, or traces in isolation.

The integration of AI and machine learning will deepen, enabling faster, more accurate diagnostics and increasingly automated remediation, says Arun Balachandran, Senior Product Marketing Manager, ManageEngine APM Solutions.

"AI will become central, automating anomaly detection, root cause analysis, and performance optimization," adds Varma Kunaparaju, SVP and GM for Cloud Platform and OpsRamp Software, HPE, "making both APM and observability more proactive and predictive. This transformation will enable more agile and resilient IT operations, driving innovation and competitive advantage."

The following are more predictions from the experts:

ASSISTIVE OBSERVABILITY

Observability will move from being reactive to being assistive. As systems grow more complex, organizations will need observability platforms that don't just show what happened, but help explain why. That requires open, high-fidelity data, which is why the CNCF ecosystem is so critical. Projects like Thanos https://thanos.io/ (for scalable metrics), Fluent Bit (for log routing), and OpenTelemetry (for structured, correlated telemetry) are laying the foundation for AI-enhanced, team-centric observability that adapts as fast as the systems it observes.
Brian Douglas
Head of Ecosystem, Cloud Native Computing Foundation (CNCF)

AGENTIC WORKFLOWS AND CONVERSATIONAL EXPERIENCES

Agentic workflows and conversational experiences will completely change IT operations workflows, making it much more practical to find and resolve issues all through conversational experiences including code generation, patching, and deployment.
Bill Lobig
VP of Observability, IBM Automation

SMART DATA PIPELINES

Data pipelines will become smarter — filtering at the edge, routing to multiple destinations, and using AI to recommend what matters. 
Gurjeet Arora
CEO and Co-Founder, Observo AI

The implementation of AI engines is going to dramatically impact the scope and capabilities of APM solutions in a positive way. We will see the rise of tightly correlated data elements that are automatically traced, identified, and presented to IT Operations staff in real time, with targeted guidance on what they should be doing next to support the health of the application. Notably, this will be to a degree that will seem almost magical compared to current solutions. We're just seeing the very beginnings of it already, but I believe that vastly more data will be ingested and understood in real time, leading to what would be interpreted today as a near-perfect understanding of application state.
Bryan Cole
Director of Customer Engineering, Tricentis

OPENTELEMETRY

AI will clearly play a significant role in the future of observability, although it's unclear exactly which use cases will dominate. Some vendors today seem excited about the prospect of AI helping developers manage the overwhelming volume of data that comes from disparate logging, metrics, and APM tools. However, this data volume challenge isn't inherent to building systems — it's a consequence of emitting data in formats designed for previous generations of tooling. I hope that as tool makers bring AI into the observability landscape, they focus more on how AI can help us swiftly move into the OpenTelemetry future, e.g. by speeding the authoring and adoption of custom instrumentation, instead of providing an "intelligent" layer on top of a hodgepodge of existing logging, monitoring, and APM tooling. The higher up in the observability "funnel" we can deploy AI, the more powerful the results will be for our development teams.
Emily Nakashima
VP of Engineering, Honeycomb

SELF-LEARNING AI

In today's increasingly complex environments, visibility alone isn't enough. The next wave of AIOps solutions is being driven by self-learning AI platforms that unify and interpret data across operational domains, transforming it into predictive, prioritized, and actionable insights — without relying on static topologies or predefined rules. AIOps platforms built on a fully AI-native architecture are shifting the focus from simply monitoring systems to enabling intelligent, autonomous operations.

By applying predictive, causal, and generative AI, these platforms not only enhance the value of existing tools but increasingly have the potential to replace standalone observability solutions. They offer a single, intelligent layer that surfaces emerging issues, pinpoints root causes, and drives automated resolution — enabling a shift from fragmented monitoring to proactive, autonomous operations. Self-learning AI will ultimately replace traditional observability platforms by becoming the integrated, real-time source of operational truth. Rather than relying on topology and rules-based AIOps and Observability platforms, the next generation of platforms will continuously learn from live telemetry, historical incidents, human actions, and system behavior to proactively detect, diagnose, and even remediate issues. This real-time learning loop will reduce noise, surface meaningful patterns, and guide teams toward faster, more confident decisions — paving the way for predictive, autonomous, and eventually self-healing IT environments.
Josh Kindiger
President, Grokstream

SELF-HEALING SYSTEMS

It's likely we'll see more widely adopted "self-healing" application capabilities through the use of AI and the observability data that feeds it.
Justin Collier
Senior Director of Product Management, SmartBear

The next few years will be less about manually building SLAs, dashboards, and alerts and more centered on self-healing and adaptive systems. With the rise of AI and ML embedded into observability platforms, we'll see a shift toward systems that can detect anomalies, determine probable root causes, and even take corrective actions without (some) human intervention. The result is not just greater efficiency but a fundamentally more resilient and intelligent digital infrastructure.
Mimi Shalash
Observability Advisor at Splunk, a Cisco Company

We're heading toward Autonomous Service Reliability: systems that not only observe themselves, but also understand, diagnose, and even self-heal with minimal human intervention.
Severin Neumann
Head of Community & Developer Relations, Causely

AI WILL NOT REPLACE HUMANS

AI's role in APM and observability will be to assist and guide, not replace, human expertise. AI will be used to narrow down search spaces, prioritize issues, and support human operators in diagnosis rather than providing fully automated solutions.
Jeff Cobb
Global Head of Product & Design, Chronosphere

AI OBSERVABILITY

Expect to see a rise in the need for AI-specific observability and application performance monitoring. Customers embedding GenAI into their own offerings will need to monitor these AI applications and AI factories, driving new requirements for observability platforms that can handle these specialized workloads.
Paul Appleby
CEO, Virtana

Pete Goldin is Editor and Publisher of APMdigest

The Latest

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

The quietest week your engineering team has ever had might also be its best. No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?" ... I've spent many years in engineering, and this pattern keeps repeating ...