
Datadog launched Bits AI SRE, an AI agent aware of telemetry, architecture, and organizational context that investigates alerts and surfaces actionable root cause in minutes, giving engineers the information they need to confidently resolve incidents faster, save engineering hours, and reduce end-user and business impact.
Bits AI SRE is part of Datadog’s Bits AI, a suite of AI capabilities that works autonomously across critical monitoring, development, and security workflows to help teams resolve application issues in real time.
Powered by the full breadth and depth of the Datadog platform’s data, Bits AI SRE provides an understanding of organizations’ systems to identify and resolve alerts fast. When an alert fires, Bits AI SRE rapidly analyzes runbooks, telemetry, and more, to separate signal from noise and uncover hypothetical root causes. It validates its own findings, identifies a final conclusion, and delivers that conclusion directly to third-party collaboration tools—all before on-call responders even log in.
What used to take hours to troubleshoot manually, can now be done in minutes autonomously by Bits AI SRE, representing a step toward a future where engineers can focus less on managing incidents and more on building resilient systems.
Designed for enterprise scale, Bits AI SRE supports HIPAA-regulated workloads, includes role-based access controls (RBAC), and features enterprise contracts with trusted AI partners—ensuring organizations adopt AI with confidence and control.
“This launch represents a pivotal expansion of Datadog’s AI strategy as our first generally available AI agent, and signals a new phase of intelligent, automated reliability,” said Yanbing Li, Chief Product Officer at Datadog. “Bits AI SRE allows companies to mitigate issues faster, reduce customer impact, and adopt AI safely. It has already been tested against more than 2,000 customer environments, including both global enterprises and fast-growing start-ups with a diverse range of production environments. Tens of thousands of investigations have run to date, from routine alerts to high-severity incidents, with organizations already reporting positive outcomes. This reflects the tangible and immediate value, tied directly to operational and business outcomes, that we are delivering.”
Bits AI SRE is the first of three AI agents that is Generally Available to all Datadog users.
The Latest
UK IT leaders are reaching a critical inflection point in how they manage observability, according to research from LogicMonitor. As infrastructure complexity grows and AI adoption accelerates, fragmented monitoring environments are driving organizations to rethink their operational strategies and consolidate tools ...
For years, many infrastructure teams treated the edge as a deployment variation. It was seen as the same cloud model, only stretched outward: more devices, more gateways, more locations and a little more latency. That assumption is proving costly. The edge is not just another place to run workloads. It is a fundamentally different operating condition ...
AI can't fix broken data. CIOs who modernize revenue data governance unlock predictable growth-those who don't risk millions in failed AI investments. For decades, CIOs kept the lights on. Revenue was someone else's problem, owned by sales, led by the CRO, measured by finance. Those days are behind us ...
Over the past few years, organizations have made enormous strides in enabling remote and hybrid work. But the foundational technologies powering today's digital workplace were never designed for the volume, velocity, and complexity that is coming next. By 2026 and beyond, three forces — 5G, the metaverse, and edge AI — will fundamentally reshape how people connect, collaborate, and access enterprise resources ... The businesses that begin preparing now will gain a competitive head start. Those that wait will find themselves trying to secure environments that have already outgrown their architecture ...
Ask where enterprise AI is making its most decisive impact, and the answer might surprise you: not marketing, not finance, not customer experience. It's IT. Across three years of industry research conducted by Digitate, one constant holds true is that IT is both the testing ground and the proving ground for enterprise AI. Last year, that position only strengthened ...
A payment gateway fails at 2 AM. Thousands of transactions hang in limbo. Post-mortems reveal failures cascading across dozens of services, each technically sound in isolation. The diagnosis takes hours. The fix requires coordinated deployments across teams ...
Every enterprise technology conversation right now circles back to AI agents. And for once, the excitement isn't running too far ahead of reality. According to a Zapier survey of over 500 enterprise leaders, 72% of enterprises are already using or testing AI agents, and 84% plan to increase their investment over the next 12 months. Those numbers are big. But they also raise a question that doesn't get asked enough: what exactly are companies doing with these agents, and are they actually getting value from them? ...
Many organizations still rely on reactive availability models, taking action only after an outage occurs. However, as applications become more complex, this approach often leads to delayed detection, prolonged disruption, and incomplete recovery. Monitoring is evolving from a basic operational function into a foundational capability for sustaining availability in modern environments ...
In MEAN TIME TO INSIGHT Episode 22, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses DNS Security ...
The financial stakes of extended service disruption has made operational resilience a top priority, according to 2026 State of AI-First Operations Report, a report from PagerDuty. According to survey findings, 95% of respondents believe their leadership understands the competitive advantage that can be gained from reducing incidents and speeding recovery ...