One more important point for the experts to consider is the impact of Artificial Intelligence (AI) on APM and Observability.
Start with: APM and Observability - Cutting Through the Confusion - Part 9
AI plays a transformative role in both APM and observability by turning raw data into actionable insights, enabling faster, more accurate detection and resolution of issues, according to Nigel Hickey, Senior Technical Marketing Manager at NetBrain.
Today, the most efficient and effective observability platforms leverage AI and ML to automate IT root cause analysis, sift through vast numbers of log files, and interpret data with higher speed and accuracy than any human IT professional or team could do independently, says Douglas James, Vice President, Solutions & Ecosystem at ScienceLogic.
"Application Performance Monitoring is no longer a siloed step — it's now integrated into broader observability workflows," explains Bahubali Shetti, Senior Director, Product Marketing, Elastic. "Instead of manually sifting through traces, logs, and metrics, AI-powered tools like an Observability AI Assistant can quickly identify high latency, failed transactions, or Kubernetes scaling issues, all with the right context."
Shetti continues, "What makes these assistants especially powerful is their use of Retrieval-Augmented Generation (RAG), which combines large language models with your organization's data, such as GitHub issues, runbooks, and documentation, to deliver smart, context-aware responses. These assistants connect the dots across all signals (logs, metrics, traces) and all sources (application, Kubernetes, cloud, etc., helping users focus more on improving systems, not just troubleshooting them."
The following are some of the many capabilities AI can provide for both APM and Observability, according to the experts:
Automating Manual Tasks
In APM, AI accelerates diagnostics work and helps teams optimize application performance through more automation and less manual sweat.
Bryan Cole
Director of Customer Engineering, Tricentis
The biggest promise of AI is to reduce or eliminate toil by automating tasks that aren't genuinely creative, but have traditionally required humans for various reasons. Unfortunately APM and Observability are rife with these kinds of tasks. Spotting anomalies, configuring alerts, scanning changes for relevant issues, assessing impact of incidents, validating deploys. All of these are things that humans routinely do, but are easy to forget or do incorrectly, which will cause or prolong an incident. Leveraging AI, an intelligent platform can automate much of that burden.
Nic Benders
Chief Technical Strategist, New Relic
Decision-Making Guidance
AI helps to explain findings to make it easier and more understandable for the human that needs to act upon the observability data. An example would be: Explain the best steps to mitigate the system outage that was identified by the observability platform! With Agentic AI this use case goes even a step further where one could ask: Open up a Pull Request with the suggested remediation steps and assign it to the team that owns the problematic system component!
Andreas Grabner
Fellow DevRel and CNCF Ambassador, Dynatrace
Conversational interfaces are emerging, allowing practitioners to essentially "chat" with their systems about performance and health. The pace of improvement is incredibly fast, and I'm genuinely excited about the capabilities that will blossom in the next one to two years.
Juraci Paixão Kröhling
Software Engineer, OllyGarden
Resource Issue Identification
AI can identify inefficiencies, such as slow query patterns or resource bottlenecks, and monitor overall system health to spot anomalies.
Ajay Khanna
CMO, Yugabyte
AI can automate alerting, optimize performance based on data and forecast potential performance issues such as resource exhaustion based on historical trends.
Varma Kunaparaju
SVP and GM for Cloud Platform and OpsRamp Software, HPE
Anomaly Detection
With anomaly detection, AI analyzes metrics, logs, and traces to identify unusual patterns, such as sudden spikes in error rates or latency, faster than manual thresholds.
Varma Kunaparaju
SVP and GM for Cloud Platform and OpsRamp Software, HPE
AI can automatically flag unusual patterns in metrics or traces that humans might miss, especially in complex distributed systems.
Rakesh Gupta
Head of Product Management, Observe
Alert Noise Reduction
As telemetry grows, AI will be essential in automating the separation of signal from noise.
Gurjeet Arora
CEO and Co-Founder, Observo AI
Alert noise reduction: Instead of getting 50 alerts when something breaks, AI can group related symptoms and surface the most likely root cause indicators.
Rakesh Gupta
Head of Product Management, Observe
AI-powered tools can help understand the telemetry data profile and separate signal from noise.
Ajay Khanna
CMO, Yugabyte
Streamlined Troubleshooting
In the APM space, AI helps automate tasks like anomaly detection, spotting performance degradation patterns, and linking incidents to specific code changes or deployments — all of which significantly speed up troubleshooting.
Arun Balachandran
Senior Product Marketing Manager, ManageEngine APM Solutions
Faster MTTR
AI facilitates intelligent automation, transforming insights into actionable steps and significantly reducing mean time to resolution (MTTR). It's about harnessing AI to not only understand, but to act swiftly and decisively.
Gab Menachem
VP ITOM, ServiceNow
Observability and APM are the best use cases of Agentic AI. LLM technology and agentic workflows can pass through massive amounts of metrics events, logs, and traces to improve the signal noise ratio accelerating MTTR/D and therefore resolution, minimizing human triage time and increasing application uptime.
Bill Lobig
VP of Observability, IBM Automation
Event Correlation
In observability, AI correlates events across logs, metrics, and traces to highlight causality.
Hugo Kaczmarek
Director of Product, APM Suite, Datadog
Root Cause Analysis
We're seeing AI assist with constructing queries, generating dashboards, interpreting raw telemetry signals, and pointing towards the likely direction of a problem's root cause.
Juraci Paixão Kröhling
Software Engineer, OllyGarden
Root cause suggestions: When an incident occurs, AI can correlate across different data sources and suggest probable causes based on historical patterns.
Rakesh Gupta
Head of Product Management, Observe
By using AI and ML to gain complete visibility and automated root cause analysis, observability solutions improve customer experiences, enhance employee productivity, and optimize digital infrastructure at profound levels.
Douglas James
VP, Solutions & Ecosystem, ScienceLogic
AI, often understood today as LLMs, can assist by enabling natural language querying and summarizing telemetry data for faster exploration. However, LLMs fall short when it comes to accurately identifying root causes, as they lack an understanding of system causality. This is where causal reasoning becomes essential. By modeling how components influence one another, causal analysis can pinpoint the actual source of incidents, not just symptoms. It provides precise, explainable insights that go beyond what LLMs can infer from surface-level patterns.
Severin Neumann
Head of Community & Developer Relations, Causely
Prioritizing Likely Problems
AI excels at text but is still evolving for data-rich environments. It should be used to guide and narrow down the search for issues rather than fully automating diagnoses or replacing human expertise. I view AI's role as being strongest when it helps prioritize likely problems, allowing humans to focus their efforts.
Jeff Cobb
Global Head of Product & Design, Chronosphere
Predicting Potential Problems
AI-powered observability processes vast volumes of telemetry data in real-time, automatically detecting anomalies, pinpointing root causes, and anticipating issues before they occur. It allows teams to shift from reactive troubleshooting to proactive, preventative operations — saving time, reducing alert fatigue, and improving reliability across complex environments.
Andreas Grabner
Fellow DevRel and CNCF Ambassador, Dynatrace
In APM, AI is increasingly used to detect unusual application behavior, user drop-off patterns, or performance degradations before they impact SLAs.
Gurjeet Arora
CEO and Co-Founder, Observo AI
AI's role is growing fast here. It's great for spotting patterns you might miss, flagging anomalies in real time and even predicting potential failures before they cause real issues. In APM, that means catching performance slowdowns early. Observability means making sense of a flood of data (logs, traces, metrics) and connecting the dots quickly.
Tanner Burson
Engineering Leader, Prismatic
In APM, AI helps baseline normal application behavior and detects anomalies in real time and accelerates root cause analysis by correlating signals across the application stack, predicting potential failures before they impact end users.
Nigel Hickey
Senior Technical Marketing Manager, NetBrain
Autoremediation
We're seeing a rise in AI-driven observability tools that not only recommend fixes but can proactively trigger automated remediation, helping teams resolve problems faster and build more resilient systems.
Arun Balachandran
Senior Product Marketing Manager, ManageEngine APM Solutions
Incident Documentation
GenAI can provide support for documentation of issues and, when included within the organizations documentation, provide better responses for future issues using retrieval augmented generation (RAG). The next step would be Agentic AI through which incidents could be automatically resolved and documented.
Harald Burose
Director, Product Management, Research & Development – Engineering, OpenText
Visibility into Business Impact
AI helps bridge the technical nature of telemetry and observability data to people outside engineering. AI allows users to get real-time answers in their context, tied to business impact.
Ariel Assaraf
CEO, Coralogix
Observability-Driven Development
AI supports observability-driven development, providing automated feedback to catch performance issues early, shifting observability from reactive troubleshooting to proactive optimization.
Ajay Khanna
CMO, Yugabyte
Cost Reduction
For observability, AI can filter out low-value data to reduce storage and licensing costs'
Gurjeet Arora
CEO and Co-Founder, Observo AI
Conclusion: Telemetry Is Key
If you're working with sampled traces and aggregated metrics, AI can't provide the full picture. The real opportunity comes from having comprehensive, unified telemetry data that enables correlation across your entire technology stack.
Rakesh Gupta
Head of Product Management, Observe
AI can accelerate incident response by surfacing anomalies, correlating patterns, and even suggesting root causes. But for AI to be meaningful, it needs structured, and rich telemetry, not black-box outputs. This is where OpenTelemetry shines. By standardizing the way metrics, logs, and traces are collected and annotated, it provides high-quality input for AI systems to reason over.
Brian Douglas
Head of Ecosystem, Cloud Native Computing Foundation (CNCF)
Go to: APM and Observability - Cutting Through the Confusion - Part 11, presenting predictions about the future of APM and Observability.