
Datadog announced Datadog On-Call, an on-call experience with observability-enriched paging and seamless incident management workflows.
Datadog On-Call instantly coordinates teams with relevant context for faster issue resolution, better incident control and improved collaboration.
By unifying observability and paging into one seamless platform, Datadog On-Call solves these issues and eliminates the inefficiencies of multiple disjointed tools, allowing engineers to focus on resolving incidents quickly and effectively without the added stress of switching contexts or missing critical information.
“Being on-call is one of the most challenging aspects of an engineer’s job, where redundant service configurations between various tools can lead to brittle, error-prone setups. The general overhead of maintaining on-call schedules and the ambiguity around service and team ownership make it a grueling ordeal, especially during critical times,” said Michael Whetten, VP of Product at Datadog. “Datadog On-Call addresses these pain points with a team-centric design that clarifies ownership, reduces redundancy and minimizes errors. This approach ensures that every team member knows their role and responsibilities, leading to quicker and more effective incident response.”
Datadog On-Call helps DevOps, SRE, Security and IT Operations teams:
- Act Quickly and Stay Informed: Paging with integrated observability and seamless incident management ensures critical insights and data are readily available within a single platform, eliminating the need for context switching.
- Connect with the Tools They Use Every Day: On-Call integrates with a rich ecosystem of third-party monitoring, alerting and service management tools so teams don’t have to learn new workflows or spend resources on training.
- Ensure Clear Service and Team Ownership: Break down knowledge silos and avoid confusion by associating teams with their respective services to simplify configuration, address ownership gaps and ensure the right responders are paged during an alert. Instantly trace upstream and downstream services affected by an outage or issue.
- Implement Intuitive Scheduling and Notifications: Automate scheduling and escalation policies to ensure continuous coverage and timely responses, reducing the burden on individual team members and enhancing overall team coordination.
- Measure On-Call Performance: Rich and customizable analytics measure on-call performance to help ensure system reliability, improve mean-time-to-resolution and optimize the well-being of on-call teams.
Datadog On-Call is in beta now.
The Latest
In today’s data and AI driven world, enterprises across industries are utilizing AI to invent new business models, reimagine business and achieve efficiency in operations. However, enterprises may face challenges like flawed or biased AI decisions, sensitive data breaches and rising regulatory risks ...
In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions....
There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...
In just a few months, Google will again head to Washington DC and meet with the government for a two-week remedy trial to cement the fate of what happens to Chrome and its search business in the face of ongoing antitrust court case(s). Or, Google may proactively decide to make changes, putting the power in its hands to outline a suitable remedy. Regardless of the outcome, one thing is sure: there will be far more implications for AI than just a shift in Google's Search business ...

In today's fast-paced digital world, Application Performance Monitoring (APM) is crucial for maintaining the health of an organization's digital ecosystem. However, the complexities of modern IT environments, including distributed architectures, hybrid clouds, and dynamic workloads, present significant challenges ... This blog explores the challenges of implementing application performance monitoring (APM) and offers strategies for overcoming them ...
Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...
IT infrastructure (on-premises, cloud, or hybrid) is becoming larger and more complex. IT management tools need data to drive better decision making and more process automation to complement manual intervention by IT staff. That is why smart organizations invest in the systems and strategies needed to make their IT infrastructure more resilient in the event of disruption, and why many are turning to application performance monitoring (APM) in conjunction with high availability (HA) clusters ...
In today's data-driven world, the management of databases has become increasingly complex and critical. The following are findings from Redgate's 2025 The State of the Database Landscape report ...
With the 2027 deadline for SAP S/4HANA migrations fast approaching, organizations are accelerating their transition plans ... For organizations that intend to remain on SAP ECC in the near-term, the focus has shifted to improving operational efficiencies and meeting demands for faster cycle times ...