Skip to main content

Datadog Introduces New Capabilities to Monitor Agentic AI

Datadog announced new agentic AI monitoring and experimentation capabilities to give organizations end-to-end visibility, rigorous testing capabilities, and centralized governance of both in-house and third-party AI agents. 

The new capabilities include AI Agent Monitoring, LLM Experiments and AI Agents Console.

Datadog is bringing observability best practices to the AI stack. Part of Datadog’s LLM Observability product, these new capabilities allow companies to monitor agentic systems, run structured LLM experiments, and evaluate usage patterns and the impact of both custom and third-party agents. This enables teams to deploy quickly and safely, accelerate iteration and improvements to their LLM applications, and prove impact.

“A recent study found only 25 percent of AI initiatives are currently delivering on their promised ROI—a troubling stat given the sheer volume of AI projects companies are pursuing globally,” said Yrieix Garnier, VP of Product at Datadog. “Today’s launches aim to help improve that number by providing accountability for companies pushing huge budgets toward AI projects. The addition of AI Agent Monitoring, LLM Experiments and AI Agents Console to our LLM Observability suite gives our customers the tools to understand, optimize and scale their AI investments.”

Now generally available, Datadog’s AI Agent Monitoring instantly maps each agent’s decision path–inputs, tool invocations, calls to other agents and outputs–in an interactive graph. Engineers can drill down into latency spikes, incorrect tool calls or unexpected behaviors like infinite agent loops, and correlate them with quality, security and cost metrics. This simplifies the debugging of complex, distributed and non-deterministic agent systems, resulting in optimized performance.

In preview, Datadog launched LLM Experiments to test and validate the impact of prompt changes, model swaps or application changes on the performance of LLM applications. The tool works by running and comparing experiments against datasets created from real production traces (input/output pairs) or uploaded by customers. This allows users to quantify improvements in response accuracy, throughput and cost—and guard against regressions.

Datadog unveiled AI Agents Console in preview, which allows organizations to establish and maintain visibility into in-house and third-party agent behavior, measure agent usage, impact and ROI, and proactively check for security and compliance risks.

The Latest

Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...

For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Datadog Introduces New Capabilities to Monitor Agentic AI

Datadog announced new agentic AI monitoring and experimentation capabilities to give organizations end-to-end visibility, rigorous testing capabilities, and centralized governance of both in-house and third-party AI agents. 

The new capabilities include AI Agent Monitoring, LLM Experiments and AI Agents Console.

Datadog is bringing observability best practices to the AI stack. Part of Datadog’s LLM Observability product, these new capabilities allow companies to monitor agentic systems, run structured LLM experiments, and evaluate usage patterns and the impact of both custom and third-party agents. This enables teams to deploy quickly and safely, accelerate iteration and improvements to their LLM applications, and prove impact.

“A recent study found only 25 percent of AI initiatives are currently delivering on their promised ROI—a troubling stat given the sheer volume of AI projects companies are pursuing globally,” said Yrieix Garnier, VP of Product at Datadog. “Today’s launches aim to help improve that number by providing accountability for companies pushing huge budgets toward AI projects. The addition of AI Agent Monitoring, LLM Experiments and AI Agents Console to our LLM Observability suite gives our customers the tools to understand, optimize and scale their AI investments.”

Now generally available, Datadog’s AI Agent Monitoring instantly maps each agent’s decision path–inputs, tool invocations, calls to other agents and outputs–in an interactive graph. Engineers can drill down into latency spikes, incorrect tool calls or unexpected behaviors like infinite agent loops, and correlate them with quality, security and cost metrics. This simplifies the debugging of complex, distributed and non-deterministic agent systems, resulting in optimized performance.

In preview, Datadog launched LLM Experiments to test and validate the impact of prompt changes, model swaps or application changes on the performance of LLM applications. The tool works by running and comparing experiments against datasets created from real production traces (input/output pairs) or uploaded by customers. This allows users to quantify improvements in response accuracy, throughput and cost—and guard against regressions.

Datadog unveiled AI Agents Console in preview, which allows organizations to establish and maintain visibility into in-house and third-party agent behavior, measure agent usage, impact and ROI, and proactively check for security and compliance risks.

The Latest

Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...

For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...

New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...

Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...