Skip to main content

Observe Introduces AI SRE and o11y.ai Agents

Observe announced the availability of two new AI agents, AI SRE and o11y.ai, built on its open data lake architecture and knowledge graph. 

The new agents drive engineering productivity with intelligent incident investigation and remediation, and faster delivery of production-ready code.

Key Highlights

  • Early customers report incident triage up to 10x faster
  • Mean time to resolution (MTTR) reduced from hours to minutes
  • Observability costs reduced by up to 60%

"As AI code generation accelerates software delivery, the bottleneck has shifted to running and maintaining systems reliably at scale," said Jeremy Burton, CEO of Observe Inc. "AI SRE and o11y.ai directly address these pain points by making systems observable, reliable, and affordable from day one."

The AI SRE agent autonomously applies context, pinpoints root causes, and suggests fixes, so teams can troubleshoot faster at scale.

AI SRE automates incident investigation with a contextual understanding of logs, metrics, and traces in real time. It reduces operational toil, minimizes on-call load, and increases accuracy in root cause identification. Built on Observe's low-cost, scalable data lake architecture, it enables enterprises to have longer data retention, while reducing observability spend by up to 60%. Governance and compliance are built-in with role-based access controls, SOC 2 Type II, ISO 27001, and GDPR support.

AI SRE enables enterprise customization and extensibility through a Model Context Protocol (MCP) Server which integrates natively with Claude Code, OpenAI Codex, Augment Code, Windsurf, n8n and other AI tools. The MCP Server uses Observe's knowledge graph to help agents quickly gather more context from the massive volume of observability data in the data lake, resulting in greater accuracy. Teams can integrate proprietary data, add custom context, automate complex workflows, and build custom AI agents tailored to their unique enterprise environments. Engineers save hours by asking questions in natural language in their code editor rather than learning and switching between multiple tools, query languages, and dashboards.

Customer outcomes with AI SRE and MCP Server:

  • Incident resolution dropped from hours to minutes
  • Operational toil and on-call burden reduced
  • Observability ROI felt immediately in engineering

o11y.ai is an observability agent that lets developers generate code instrumentation, debug, and ask questions about their application.

Built for developers, o11y.ai makes observability as natural as coding. The agent adds OpenTelemetry instrumentation from day one, giving engineers instant access to the logs, metrics, and traces they need. Developers can ask questions about usage, errors, and performance, as well as debug and validate fixes using context from their telemetry and code.

Customer outcomes with o11y.ai:

  • Shorter feedback loops
  • Faster root cause analysis
  • Higher engineering velocity

The Latest

From smart factories and autonomous vehicles to real-time analytics and intelligent building systems, the demand for instant, local data processing is exploding. To meet these needs, organizations are leaning into edge computing. The promise? Faster performance, reduced latency and less strain on centralized infrastructure. But there's a catch: Not every network is ready to support edge deployments ...

Every digital customer interaction, every cloud deployment, and every AI model depends on the same foundation: the ability to see, understand, and act on data in real time ... Recent data from Splunk confirms that 74% of the business leaders believe observability is essential to monitoring critical business processes, and 66% feel it's key to understanding user journeys. Because while the unknown is inevitable, observability makes it manageable. Let's explore why ...

Organizations that perform regular audits and assessments of AI system performance and compliance are over three times more likely to achieve high GenAI value than organizations that do not, according to a survey by Gartner ...

Kubernetes has become the backbone of cloud infrastructure, but it's also one of its biggest cost drivers. Recent research shows that 98% of senior IT leaders say Kubernetes now drives cloud spend, yet 91% still can't optimize it effectively. After years of adoption, most organizations have moved past discovery. They know container sprawl, idle resources and reactive scaling inflate costs. What they don't know is how to fix it ...

Artificial intelligence is no longer a future investment. It's already embedded in how we work — whether through copilots in productivity apps, real-time transcription tools in meetings, or machine learning models fueling analytics and personalization. But while enterprise adoption accelerates, there's one critical area many leaders have yet to examine: Can your network actually support AI at the speed your users expect? ...

The more technology businesses invest in, the more potential attack surfaces they have that can be exploited. Without the right continuity plans in place, the disruptions caused by these attacks can bring operations to a standstill and cause irreparable damage to an organization. It's essential to take the time now to ensure your business has the right tools, processes, and recovery initiatives in place to weather any type of IT disaster that comes up. Here are some effective strategies you can follow to achieve this ...

In today's fast-paced AI landscape, CIOs, IT leaders, and engineers are constantly challenged to manage increasingly complex and interconnected systems. The sheer scale and velocity of data generated by modern infrastructure can be overwhelming, making it difficult to maintain uptime, prevent outages, and create a seamless customer experience. This complexity is magnified by the industry's shift towards agentic AI ...

In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ... 

The explosion of generative AI and machine learning capabilities has fundamentally changed the conversation around cloud migration. It's no longer just about modernization or cost savings — it's about being able to compete in a market where AI is rapidly becoming table stakes. Companies that can't quickly spin up AI workloads, feed models with data at scale, or experiment with new capabilities are falling behind faster than ever before. But here's what I'm seeing: many organizations want to capitalize on AI, but they're stuck ...

On September 16, the world celebrated the 10th annual IT Pro Day, giving companies a chance to laud the professionals who serve as the backbone to almost every successful business across the globe. Despite the growing importance of their roles, many IT pros still work in the background and often go underappreciated ...

Observe Introduces AI SRE and o11y.ai Agents

Observe announced the availability of two new AI agents, AI SRE and o11y.ai, built on its open data lake architecture and knowledge graph. 

The new agents drive engineering productivity with intelligent incident investigation and remediation, and faster delivery of production-ready code.

Key Highlights

  • Early customers report incident triage up to 10x faster
  • Mean time to resolution (MTTR) reduced from hours to minutes
  • Observability costs reduced by up to 60%

"As AI code generation accelerates software delivery, the bottleneck has shifted to running and maintaining systems reliably at scale," said Jeremy Burton, CEO of Observe Inc. "AI SRE and o11y.ai directly address these pain points by making systems observable, reliable, and affordable from day one."

The AI SRE agent autonomously applies context, pinpoints root causes, and suggests fixes, so teams can troubleshoot faster at scale.

AI SRE automates incident investigation with a contextual understanding of logs, metrics, and traces in real time. It reduces operational toil, minimizes on-call load, and increases accuracy in root cause identification. Built on Observe's low-cost, scalable data lake architecture, it enables enterprises to have longer data retention, while reducing observability spend by up to 60%. Governance and compliance are built-in with role-based access controls, SOC 2 Type II, ISO 27001, and GDPR support.

AI SRE enables enterprise customization and extensibility through a Model Context Protocol (MCP) Server which integrates natively with Claude Code, OpenAI Codex, Augment Code, Windsurf, n8n and other AI tools. The MCP Server uses Observe's knowledge graph to help agents quickly gather more context from the massive volume of observability data in the data lake, resulting in greater accuracy. Teams can integrate proprietary data, add custom context, automate complex workflows, and build custom AI agents tailored to their unique enterprise environments. Engineers save hours by asking questions in natural language in their code editor rather than learning and switching between multiple tools, query languages, and dashboards.

Customer outcomes with AI SRE and MCP Server:

  • Incident resolution dropped from hours to minutes
  • Operational toil and on-call burden reduced
  • Observability ROI felt immediately in engineering

o11y.ai is an observability agent that lets developers generate code instrumentation, debug, and ask questions about their application.

Built for developers, o11y.ai makes observability as natural as coding. The agent adds OpenTelemetry instrumentation from day one, giving engineers instant access to the logs, metrics, and traces they need. Developers can ask questions about usage, errors, and performance, as well as debug and validate fixes using context from their telemetry and code.

Customer outcomes with o11y.ai:

  • Shorter feedback loops
  • Faster root cause analysis
  • Higher engineering velocity

The Latest

From smart factories and autonomous vehicles to real-time analytics and intelligent building systems, the demand for instant, local data processing is exploding. To meet these needs, organizations are leaning into edge computing. The promise? Faster performance, reduced latency and less strain on centralized infrastructure. But there's a catch: Not every network is ready to support edge deployments ...

Every digital customer interaction, every cloud deployment, and every AI model depends on the same foundation: the ability to see, understand, and act on data in real time ... Recent data from Splunk confirms that 74% of the business leaders believe observability is essential to monitoring critical business processes, and 66% feel it's key to understanding user journeys. Because while the unknown is inevitable, observability makes it manageable. Let's explore why ...

Organizations that perform regular audits and assessments of AI system performance and compliance are over three times more likely to achieve high GenAI value than organizations that do not, according to a survey by Gartner ...

Kubernetes has become the backbone of cloud infrastructure, but it's also one of its biggest cost drivers. Recent research shows that 98% of senior IT leaders say Kubernetes now drives cloud spend, yet 91% still can't optimize it effectively. After years of adoption, most organizations have moved past discovery. They know container sprawl, idle resources and reactive scaling inflate costs. What they don't know is how to fix it ...

Artificial intelligence is no longer a future investment. It's already embedded in how we work — whether through copilots in productivity apps, real-time transcription tools in meetings, or machine learning models fueling analytics and personalization. But while enterprise adoption accelerates, there's one critical area many leaders have yet to examine: Can your network actually support AI at the speed your users expect? ...

The more technology businesses invest in, the more potential attack surfaces they have that can be exploited. Without the right continuity plans in place, the disruptions caused by these attacks can bring operations to a standstill and cause irreparable damage to an organization. It's essential to take the time now to ensure your business has the right tools, processes, and recovery initiatives in place to weather any type of IT disaster that comes up. Here are some effective strategies you can follow to achieve this ...

In today's fast-paced AI landscape, CIOs, IT leaders, and engineers are constantly challenged to manage increasingly complex and interconnected systems. The sheer scale and velocity of data generated by modern infrastructure can be overwhelming, making it difficult to maintain uptime, prevent outages, and create a seamless customer experience. This complexity is magnified by the industry's shift towards agentic AI ...

In MEAN TIME TO INSIGHT Episode 19, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA explains the cause of the AWS outage in October ... 

The explosion of generative AI and machine learning capabilities has fundamentally changed the conversation around cloud migration. It's no longer just about modernization or cost savings — it's about being able to compete in a market where AI is rapidly becoming table stakes. Companies that can't quickly spin up AI workloads, feed models with data at scale, or experiment with new capabilities are falling behind faster than ever before. But here's what I'm seeing: many organizations want to capitalize on AI, but they're stuck ...

On September 16, the world celebrated the 10th annual IT Pro Day, giving companies a chance to laud the professionals who serve as the backbone to almost every successful business across the globe. Despite the growing importance of their roles, many IT pros still work in the background and often go underappreciated ...