Skip to main content

2026 Observability Predictions - Part 9

In APMdigest's 2026 Observability Predictions Series, industry experts — from analysts and consultants to the top vendors — offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 9 covers Observability of AI.

AI OBSERVABILITY

In 2026, visibility will become critical for AI systems. As AI becomes a bigger piece of software architecture, the biggest worry won't just be cost and performance, it'll be trust. All those new systems look great on paper, but it only takes one high-profile goof-up before we see a lot of people suddenly much more interested in what their AI systems are doing, why they behave a certain way, and how those decisions affect systems, customers, and costs. Observability tools will need to rise to this challenge as users expect, and need, solutions that work natively with AI.
Nic Benders
Chief Technical Strategist, New Relic

In 2026, we will see increased pressure for organizations of all sizes to truly adopt and leverage AI-based technologies to realize the much-promised ROI in terms of business productivity and agility. However, AI-based insights and automation (i.e. agents) are dependent on data that accurately describes the IT infrastructure and business services that are hosted within it. Observability, therefore, moves from a useful monitoring discipline to a mission-critical capability that is fundamentally required to unlock AI-driven transformation in the modern enterprise. 
Mike Nappi
Chief Product and Engineering Officer, ScienceLogic

AI has made observability essential. As teams move from experimenting to running AI in production, they're realizing how little visibility they have. You can't secure or optimize what you can't see and observability is the bridge between human judgment and machine action. In 2026, the companies that thrive will pair ambition with discipline. Resilience is the new speed. The future of software isn't human or AI, it's human plus AI, connected by observability.
Christine Yen
CEO and Co-Founder, Honeycomb

AI EUEM

Just as enterprise employee productivity extended from desktop to mobile, and the office to work from anywhere, employee productivity will extend from applications to chatbots and agentic interfaces, which will require End User Experience Management solutions to monitor AI interfaces to deliver the comprehensive visibility and resilience that enterprises require. To stay ahead, IT decision-makers  need to be  proactive: embed secure, enterprise-grade AI solutions into workflows, establish  robust processes to audit AI usage, and educate teams on responsible practices. Endpoint management  will be about governing AI-powered interactions at every user touchpoint to  maintain  security without stifling productivity.  
Mitch Berk
Senior Director of Product Management, Omnissa

AI DEX

In 2026, digital employee experience (DEX) will be defined by "invisible AI"  as copilots and agents embed themselves into workflows to summarize content, draft responses, and reduce cognitive load so employees can focus on higher-value work. However, this same shift introduces a new layer of risks as workers increasingly deploy their own shadow AI agents or use AI-powered tools without proper guidance, often exposing sensitive data to external models without realizing it. The future of DEX  will be  just as  much about enabling  workforce  productivity as  it is about  ensuring every AI agent and AI-enabled workflow is transparent, accountable, and aligned with enterprise policy.   
Mitch Berk
Senior Director of Product Management, Omnissa

DEVOPS FOR MACHINES

DevOps for Machines, Not Just Humans: DevOps is evolving beyond its traditional focus on deploying applications. DevOps for machines means governing the real-time interaction between AI agents and enterprise data, with the same rigor once reserved for production apps. Modern teams will now treat data and AI pipelines as mission-critical workloads, ensuring that AI agents have real-time, governed access to enterprise data while maintaining reliability, security, and observability at scale. DevOps for machines is about managing the data-to-action lifecycle, not model training pipelines. Humans remain responsible for defining access, policy, and safety nets. For example, tomorrow's DevOps teams will monitor not only application uptime, but also AI decision health to ensure agents operate within defined parameters. This evolution requires a new mindset: one where DevOps teams are responsible for orchestrating an ecosystem in which machines, not just humans, can operate safely, efficiently, and autonomously. 
Justin Borgman
CEO and Cofounder, Starburst

AI RELIABILITY METRIC

The AI incident will become a distinct category: Organizations will start to treat AI system failures as their own incident classification, separate from traditional infrastructure or application issues. We'll see the emergence of specialized runbooks for AI model drift, hallucination events and security risks like prompt injection attacks. These incidents will require even more cross-functional than usual response teams across every part of a business, forcing a rethinking of on-call rotations and availability of subject matter experts in ML engineering, data scientists and even parts of the business that may not be used to incident response. Companies will start measuring "AI reliability" as a distinct metric alongside traditional SLOs.
Kat Gaines
Senior Manager, Developer Relations, PagerDuty

MODEL OBSERVABILITY SLO

As AI becomes just another part of the production stack, the way we think about reliability will evolve. I think we may start to see the first true "model observability SLOs," tracking things like prediction freshness and hallucination rate.
Matt Ryer
Principal Software Engineer, Grafana Labs

AUTOMATED GUARDRAILS

AI will become the biggest driver of hidden system drift because modern architectures already generate more structural change than teams can manually review. Many outages now start with small updates that no one noticed and AI will accelerate that pattern. As AI systems write code, modify schemas, and optimize configurations, the volume of change will rise faster than human oversight can scale. Engineering teams will respond by introducing automated guardrails that validate every AI action at build time before it reaches production.
Ryan McCurdy
VP, Liquibase

AI DATA OBSERVABILITY

Observability extends to AI itself: You can't optimize what you can't see, and in 2026, that includes AI models. We're already seeing this shift: organizations are bringing their AI pipelines into the same "single pane of glass" they use for applications, infrastructure, and business metrics. But as teams adopt this new generation of telemetry, they'll quickly realize that observing AI isn't actually about the model, it's about the data feeding it. Understanding the relationships between data sources, transformations, and outputs will become as critical as latency and error rates in the last generation of observability. 
Matt Ryer
Principal Software Engineer, Grafana Labs

AI DRIVES COMPLEXITY

The Explosion of Apps and Agents Will Transform IT Management: Today, the average IT department manages around a hundred applications. But in 2026 that number will grow dramatically. Creating apps and AI-powered agents will become so fast and easy that IT teams could soon find themselves managing thousands of them — some running only for hours or days. This explosion will make IT environments far more complex and increase security, compliance, and data management risks. To stay ahead, organizations will need automation and intelligent tools that simplify how applications and agents are delivered, secured, and governed across any platform or cloud. The future of cybersecurity and IT management will depend on this balance between rapid innovation and strong control.
Prashant Ketkar
CTO, Parallels

Observability is all about inferring the state of applications, your classic "we don't know what we don't know" scenario. When it comes to AI, not only is the technology largely black-box in nature, but it's making ecosystems increasingly large and complex with further system, tool, and API integrations and interconnectivity. The discipline of observability will play a central role in grasping a complete understanding of enterprises' evolving systems to ensure both availability and security.
Bryan Cole
Director of Customer Engineering, Tricentis

Go to 2026 NetOps Predictions
 

Hot Topics

The Latest

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Technology leaders across the federal landscape are facing, and will continue to face, an uphill battle when it comes to fortifying their digital environments against hostile and persistent threat actors. On one hand, they are being asked to push digital transformation ... On the other hand, they are facing the fiscal uncertainty of continuing resolutions (CR) and government shutdowns looming near and far. In the face of these challenges, CIOs, CTOs, and CISOs must figure out how to modernize legacy systems and infrastructure while doing more with less and still defending against external and internal threats ...

Reliability is no longer proven by uptime alone, according to the The SRE Report 2026 from LogicMonitor. In the AI era, it is experienced through speed, consistency, and user trust, and increasingly judged by business impact. As digital services grow more complex and AI systems move into production, traditional monitoring approaches are struggling to keep pace, increasing the need for AI-first observability that spans applications, infrastructure, and the Internet ...

If AI is the engine of a modern organization, then data engineering is the road system beneath it. You can build the most powerful engine in the world, but without paved roads, traffic signals, and bridges that can support its weight, it will stall. In many enterprises, the engine is ready. The roads are not ...

In the world of digital-first business, there is no tolerance for service outages. Businesses know that outages are the quickest way to lose money and customers. For smaller organizations, unplanned downtime could even force the business to close ... A new study from PagerDuty, The State of AI-First Operations, reveals that companies actively incorporating AI into operations now view operational resilience as a growth driver rather than a cost center. But how are they achieving it? ...

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

2026 Observability Predictions - Part 9

In APMdigest's 2026 Observability Predictions Series, industry experts — from analysts and consultants to the top vendors — offer predictions on how Observability and related technologies will evolve and impact business in 2026. Part 9 covers Observability of AI.

AI OBSERVABILITY

In 2026, visibility will become critical for AI systems. As AI becomes a bigger piece of software architecture, the biggest worry won't just be cost and performance, it'll be trust. All those new systems look great on paper, but it only takes one high-profile goof-up before we see a lot of people suddenly much more interested in what their AI systems are doing, why they behave a certain way, and how those decisions affect systems, customers, and costs. Observability tools will need to rise to this challenge as users expect, and need, solutions that work natively with AI.
Nic Benders
Chief Technical Strategist, New Relic

In 2026, we will see increased pressure for organizations of all sizes to truly adopt and leverage AI-based technologies to realize the much-promised ROI in terms of business productivity and agility. However, AI-based insights and automation (i.e. agents) are dependent on data that accurately describes the IT infrastructure and business services that are hosted within it. Observability, therefore, moves from a useful monitoring discipline to a mission-critical capability that is fundamentally required to unlock AI-driven transformation in the modern enterprise. 
Mike Nappi
Chief Product and Engineering Officer, ScienceLogic

AI has made observability essential. As teams move from experimenting to running AI in production, they're realizing how little visibility they have. You can't secure or optimize what you can't see and observability is the bridge between human judgment and machine action. In 2026, the companies that thrive will pair ambition with discipline. Resilience is the new speed. The future of software isn't human or AI, it's human plus AI, connected by observability.
Christine Yen
CEO and Co-Founder, Honeycomb

AI EUEM

Just as enterprise employee productivity extended from desktop to mobile, and the office to work from anywhere, employee productivity will extend from applications to chatbots and agentic interfaces, which will require End User Experience Management solutions to monitor AI interfaces to deliver the comprehensive visibility and resilience that enterprises require. To stay ahead, IT decision-makers  need to be  proactive: embed secure, enterprise-grade AI solutions into workflows, establish  robust processes to audit AI usage, and educate teams on responsible practices. Endpoint management  will be about governing AI-powered interactions at every user touchpoint to  maintain  security without stifling productivity.  
Mitch Berk
Senior Director of Product Management, Omnissa

AI DEX

In 2026, digital employee experience (DEX) will be defined by "invisible AI"  as copilots and agents embed themselves into workflows to summarize content, draft responses, and reduce cognitive load so employees can focus on higher-value work. However, this same shift introduces a new layer of risks as workers increasingly deploy their own shadow AI agents or use AI-powered tools without proper guidance, often exposing sensitive data to external models without realizing it. The future of DEX  will be  just as  much about enabling  workforce  productivity as  it is about  ensuring every AI agent and AI-enabled workflow is transparent, accountable, and aligned with enterprise policy.   
Mitch Berk
Senior Director of Product Management, Omnissa

DEVOPS FOR MACHINES

DevOps for Machines, Not Just Humans: DevOps is evolving beyond its traditional focus on deploying applications. DevOps for machines means governing the real-time interaction between AI agents and enterprise data, with the same rigor once reserved for production apps. Modern teams will now treat data and AI pipelines as mission-critical workloads, ensuring that AI agents have real-time, governed access to enterprise data while maintaining reliability, security, and observability at scale. DevOps for machines is about managing the data-to-action lifecycle, not model training pipelines. Humans remain responsible for defining access, policy, and safety nets. For example, tomorrow's DevOps teams will monitor not only application uptime, but also AI decision health to ensure agents operate within defined parameters. This evolution requires a new mindset: one where DevOps teams are responsible for orchestrating an ecosystem in which machines, not just humans, can operate safely, efficiently, and autonomously. 
Justin Borgman
CEO and Cofounder, Starburst

AI RELIABILITY METRIC

The AI incident will become a distinct category: Organizations will start to treat AI system failures as their own incident classification, separate from traditional infrastructure or application issues. We'll see the emergence of specialized runbooks for AI model drift, hallucination events and security risks like prompt injection attacks. These incidents will require even more cross-functional than usual response teams across every part of a business, forcing a rethinking of on-call rotations and availability of subject matter experts in ML engineering, data scientists and even parts of the business that may not be used to incident response. Companies will start measuring "AI reliability" as a distinct metric alongside traditional SLOs.
Kat Gaines
Senior Manager, Developer Relations, PagerDuty

MODEL OBSERVABILITY SLO

As AI becomes just another part of the production stack, the way we think about reliability will evolve. I think we may start to see the first true "model observability SLOs," tracking things like prediction freshness and hallucination rate.
Matt Ryer
Principal Software Engineer, Grafana Labs

AUTOMATED GUARDRAILS

AI will become the biggest driver of hidden system drift because modern architectures already generate more structural change than teams can manually review. Many outages now start with small updates that no one noticed and AI will accelerate that pattern. As AI systems write code, modify schemas, and optimize configurations, the volume of change will rise faster than human oversight can scale. Engineering teams will respond by introducing automated guardrails that validate every AI action at build time before it reaches production.
Ryan McCurdy
VP, Liquibase

AI DATA OBSERVABILITY

Observability extends to AI itself: You can't optimize what you can't see, and in 2026, that includes AI models. We're already seeing this shift: organizations are bringing their AI pipelines into the same "single pane of glass" they use for applications, infrastructure, and business metrics. But as teams adopt this new generation of telemetry, they'll quickly realize that observing AI isn't actually about the model, it's about the data feeding it. Understanding the relationships between data sources, transformations, and outputs will become as critical as latency and error rates in the last generation of observability. 
Matt Ryer
Principal Software Engineer, Grafana Labs

AI DRIVES COMPLEXITY

The Explosion of Apps and Agents Will Transform IT Management: Today, the average IT department manages around a hundred applications. But in 2026 that number will grow dramatically. Creating apps and AI-powered agents will become so fast and easy that IT teams could soon find themselves managing thousands of them — some running only for hours or days. This explosion will make IT environments far more complex and increase security, compliance, and data management risks. To stay ahead, organizations will need automation and intelligent tools that simplify how applications and agents are delivered, secured, and governed across any platform or cloud. The future of cybersecurity and IT management will depend on this balance between rapid innovation and strong control.
Prashant Ketkar
CTO, Parallels

Observability is all about inferring the state of applications, your classic "we don't know what we don't know" scenario. When it comes to AI, not only is the technology largely black-box in nature, but it's making ecosystems increasingly large and complex with further system, tool, and API integrations and interconnectivity. The discipline of observability will play a central role in grasping a complete understanding of enterprises' evolving systems to ensure both availability and security.
Bryan Cole
Director of Customer Engineering, Tricentis

Go to 2026 NetOps Predictions
 

Hot Topics

The Latest

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Technology leaders across the federal landscape are facing, and will continue to face, an uphill battle when it comes to fortifying their digital environments against hostile and persistent threat actors. On one hand, they are being asked to push digital transformation ... On the other hand, they are facing the fiscal uncertainty of continuing resolutions (CR) and government shutdowns looming near and far. In the face of these challenges, CIOs, CTOs, and CISOs must figure out how to modernize legacy systems and infrastructure while doing more with less and still defending against external and internal threats ...

Reliability is no longer proven by uptime alone, according to the The SRE Report 2026 from LogicMonitor. In the AI era, it is experienced through speed, consistency, and user trust, and increasingly judged by business impact. As digital services grow more complex and AI systems move into production, traditional monitoring approaches are struggling to keep pace, increasing the need for AI-first observability that spans applications, infrastructure, and the Internet ...

If AI is the engine of a modern organization, then data engineering is the road system beneath it. You can build the most powerful engine in the world, but without paved roads, traffic signals, and bridges that can support its weight, it will stall. In many enterprises, the engine is ready. The roads are not ...

In the world of digital-first business, there is no tolerance for service outages. Businesses know that outages are the quickest way to lose money and customers. For smaller organizations, unplanned downtime could even force the business to close ... A new study from PagerDuty, The State of AI-First Operations, reveals that companies actively incorporating AI into operations now view operational resilience as a growth driver rather than a cost center. But how are they achieving it? ...

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...