APM and Observability: Cutting Through the Confusion — Part 12

August 26, 2025

Pete Goldin

APMdigest

In Part 12, the final installment in the series, the experts present some final predictions about AI's future impact on APM and Observability.

Start with: APM and Observability - Cutting Through the Confusion - Part 11

AI-powered capabilities such as AI Assistants, zero-config ML-based multi-signal correlation, pattern analysis, failure detection, latency analysis, and more are enriching the APM experience and tightly integrating it with other observability signals, according to Bahubali Shetti, Senior Director, Product Marketing, Elastic. Users can solve problems holistically using all available signals and data, rather than relying on metrics, logs, or traces in isolation.

The integration of AI and machine learning will deepen, enabling faster, more accurate diagnostics and increasingly automated remediation, says Arun Balachandran, Senior Product Marketing Manager, ManageEngine APM Solutions.

"AI will become central, automating anomaly detection, root cause analysis, and performance optimization," adds Varma Kunaparaju, SVP and GM for Cloud Platform and OpsRamp Software, HPE, "making both APM and observability more proactive and predictive. This transformation will enable more agile and resilient IT operations, driving innovation and competitive advantage."

The following are more predictions from the experts:

ASSISTIVE OBSERVABILITY

Observability will move from being reactive to being assistive. As systems grow more complex, organizations will need observability platforms that don't just show what happened, but help explain why. That requires open, high-fidelity data, which is why the CNCF ecosystem is so critical. Projects like Thanos https://thanos.io/ (for scalable metrics), Fluent Bit (for log routing), and OpenTelemetry (for structured, correlated telemetry) are laying the foundation for AI-enhanced, team-centric observability that adapts as fast as the systems it observes.
Brian Douglas
Head of Ecosystem, Cloud Native Computing Foundation (CNCF)

AGENTIC WORKFLOWS AND CONVERSATIONAL EXPERIENCES

Agentic workflows and conversational experiences will completely change IT operations workflows, making it much more practical to find and resolve issues all through conversational experiences including code generation, patching, and deployment.
Bill Lobig
VP of Observability, IBM Automation

SMART DATA PIPELINES

Data pipelines will become smarter — filtering at the edge, routing to multiple destinations, and using AI to recommend what matters.
Gurjeet Arora
CEO and Co-Founder, Observo AI

The implementation of AI engines is going to dramatically impact the scope and capabilities of APM solutions in a positive way. We will see the rise of tightly correlated data elements that are automatically traced, identified, and presented to IT Operations staff in real time, with targeted guidance on what they should be doing next to support the health of the application. Notably, this will be to a degree that will seem almost magical compared to current solutions. We're just seeing the very beginnings of it already, but I believe that vastly more data will be ingested and understood in real time, leading to what would be interpreted today as a near-perfect understanding of application state.
Bryan Cole
Director of Customer Engineering, Tricentis

OPENTELEMETRY

AI will clearly play a significant role in the future of observability, although it's unclear exactly which use cases will dominate. Some vendors today seem excited about the prospect of AI helping developers manage the overwhelming volume of data that comes from disparate logging, metrics, and APM tools. However, this data volume challenge isn't inherent to building systems — it's a consequence of emitting data in formats designed for previous generations of tooling. I hope that as tool makers bring AI into the observability landscape, they focus more on how AI can help us swiftly move into the OpenTelemetry future, e.g. by speeding the authoring and adoption of custom instrumentation, instead of providing an "intelligent" layer on top of a hodgepodge of existing logging, monitoring, and APM tooling. The higher up in the observability "funnel" we can deploy AI, the more powerful the results will be for our development teams.
Emily Nakashima
VP of Engineering, Honeycomb

SELF-LEARNING AI

In today's increasingly complex environments, visibility alone isn't enough. The next wave of AIOps solutions is being driven by self-learning AI platforms that unify and interpret data across operational domains, transforming it into predictive, prioritized, and actionable insights — without relying on static topologies or predefined rules. AIOps platforms built on a fully AI-native architecture are shifting the focus from simply monitoring systems to enabling intelligent, autonomous operations.

By applying predictive, causal, and generative AI, these platforms not only enhance the value of existing tools but increasingly have the potential to replace standalone observability solutions. They offer a single, intelligent layer that surfaces emerging issues, pinpoints root causes, and drives automated resolution — enabling a shift from fragmented monitoring to proactive, autonomous operations. Self-learning AI will ultimately replace traditional observability platforms by becoming the integrated, real-time source of operational truth. Rather than relying on topology and rules-based AIOps and Observability platforms, the next generation of platforms will continuously learn from live telemetry, historical incidents, human actions, and system behavior to proactively detect, diagnose, and even remediate issues. This real-time learning loop will reduce noise, surface meaningful patterns, and guide teams toward faster, more confident decisions — paving the way for predictive, autonomous, and eventually self-healing IT environments.
Josh Kindiger
President, Grokstream

SELF-HEALING SYSTEMS

It's likely we'll see more widely adopted "self-healing" application capabilities through the use of AI and the observability data that feeds it.
Justin Collier
Senior Director of Product Management, SmartBear

The next few years will be less about manually building SLAs, dashboards, and alerts and more centered on self-healing and adaptive systems. With the rise of AI and ML embedded into observability platforms, we'll see a shift toward systems that can detect anomalies, determine probable root causes, and even take corrective actions without (some) human intervention. The result is not just greater efficiency but a fundamentally more resilient and intelligent digital infrastructure.
Mimi Shalash
Observability Advisor at Splunk, a Cisco Company

We're heading toward Autonomous Service Reliability: systems that not only observe themselves, but also understand, diagnose, and even self-heal with minimal human intervention.
Severin Neumann
Head of Community & Developer Relations, Causely

AI WILL NOT REPLACE HUMANS

AI's role in APM and observability will be to assist and guide, not replace, human expertise. AI will be used to narrow down search spaces, prioritize issues, and support human operators in diagnosis rather than providing fully automated solutions.
Jeff Cobb
Global Head of Product & Design, Chronosphere

AI OBSERVABILITY

Expect to see a rise in the need for AI-specific observability and application performance monitoring. Customers embedding GenAI into their own offerings will need to monitor these AI applications and AI factories, driving new requirements for observability platforms that can handle these specialized workloads.
Paul Appleby
CEO, Virtana

Pete Goldin is Editor and Publisher of APMdigest

Hot Topics

The Latest

MEAN TIME TO INSIGHT Podcast - Episode 24: Network Observability Tool Sprawl

May 29, 2026

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ...

Capacity Isn't a Guess: Observability-Driven Sizing for On-Prem Databases

May 28, 2026

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

5 Security Principles Every Entrepreneur Should Apply to Leadership

May 27, 2026

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Signs It May Be Time to Reassess Your IT Infrastructure Strategy

May 26, 2026

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Enterprise Edge AI Reaches Inflection Point

May 22, 2026

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

AI Is Hitting Operational Limits

May 21, 2026

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

Alert Fatigue Is No Longer a Morale Problem, It's a Reliability Risk and a System Failure

May 20, 2026

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

Most Enterprises Think They Can Switch AI Vendors in a Month ... Most Who've Tried Couldn't

May 19, 2026

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Your Observability Stack Has a Telemetry Pipeline Problem

May 18, 2026

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

Operator to Orchestrator: 80% of IT Pros See Shift in Role as AI Permeates Workflows

May 15, 2026

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

APM and Observability: Cutting Through the Confusion — Part 12

August 26, 2025

Pete Goldin

APMdigest

In Part 12, the final installment in the series, the experts present some final predictions about AI's future impact on APM and Observability.

Start with: APM and Observability - Cutting Through the Confusion - Part 11

The following are more predictions from the experts:

ASSISTIVE OBSERVABILITY

AGENTIC WORKFLOWS AND CONVERSATIONAL EXPERIENCES

SMART DATA PIPELINES

Data pipelines will become smarter — filtering at the edge, routing to multiple destinations, and using AI to recommend what matters.
Gurjeet Arora
CEO and Co-Founder, Observo AI

OPENTELEMETRY

SELF-LEARNING AI

SELF-HEALING SYSTEMS

AI WILL NOT REPLACE HUMANS

AI OBSERVABILITY

Pete Goldin is Editor and Publisher of APMdigest

Hot Topics

The Latest

MEAN TIME TO INSIGHT Podcast - Episode 24: Network Observability Tool Sprawl

May 29, 2026

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ...

Capacity Isn't a Guess: Observability-Driven Sizing for On-Prem Databases

May 28, 2026

5 Security Principles Every Entrepreneur Should Apply to Leadership

May 27, 2026

Signs It May Be Time to Reassess Your IT Infrastructure Strategy

May 26, 2026

Enterprise Edge AI Reaches Inflection Point

May 22, 2026

AI Is Hitting Operational Limits

May 21, 2026

Alert Fatigue Is No Longer a Morale Problem, It's a Reliability Risk and a System Failure

May 20, 2026

Most Enterprises Think They Can Switch AI Vendors in a Month ... Most Who've Tried Couldn't

May 19, 2026

Your Observability Stack Has a Telemetry Pipeline Problem

May 18, 2026

Operator to Orchestrator: 80% of IT Pros See Shift in Role as AI Permeates Workflows

May 15, 2026

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

Featured Free Trial

Featured Webinar

Featured Webinar

Featured Webinar

Featured eBook

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured eBook

Featured White Paper

Featured Free Trial

Featured Free Trial

Featured Webinar

Featured White Paper

Featured Webinar

Featured Webinar

Featured Report

Featured Webinar

Featured White Paper

Featured Webinar

Featured Report

Featured Webinar

Featured Webinar

Featured Webinar

Featured eBook

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Free Trial

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Webinar

Featured eBook

Featured White Paper

Featured Free Trial

Featured Webinar

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Report

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Report

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Webinar

Featured eBook

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar