PagerDuty Adds Automated Incident Response
November 16, 2021
Share this

PagerDuty announced new capabilities to further support the digital-first orientation of businesses as they seek to meet heightened expectations for customer experiences.

The new solutions inject control and logic at the event layer instantly to drive real-time behaviors and workflows.

Uniquely integrating this new event management into operations automation reduces manual processes and toil, automates work, and drives best practices across distributed teams.

Along with these new features, PagerDuty also announced the general availability of Change Events in Mobile, Rundeck Actions, Round Robin Scheduling, and Probable Incident Origin, bringing a strong suite of capabilities all aimed at enhancing automated incident response to drive customer engagement and efficient operations across the modern enterprise.

“Successful business operations in today’s world are fully digitized. Mission-critical work is urgent, unplanned, and involves distributed teams that need to assemble and collaborate effectively when minutes of delay can mean millions in lost revenue,” said Sean Scott, PagerDuty’s CPO. “PagerDuty’s Operations Cloud connects teams, departments, and dependencies, empowering companies to master key services and manage time-critical work that impacts customer experience.”

Digital services are in a constant state of change with a complex web of dependencies, which is further complicated as 72% of tech leaders report their organizations are actively accelerating their digital transformation strategies. Connecting and correlating data and signals for real-time information means ingesting signals from numerous sources, both structured and unstructured, and quickly turning the data into insights that guides actions.

New PagerDuty capabilities include:

- New Event Orchestration: Reduce manual processes and toil to gain operational efficiency

Event Intelligence is well-known for its noise reduction capabilities with features like intelligent alert grouping. PagerDuty now delivers the ability for teams to minimize transient noise with machine learning. Event Orchestration cuts down on businesses’ manual event processing with a powerful decision engine. Teams can now create custom logic to enrich, modify, and control routing based on event conditions at scale. Event Orchestration combines nested event rules for precise, targeted automation including diagnostics and remediation to reduce toil and gain operational efficiency.

“Customers are increasingly looking for event orchestration capabilities that balance human-led and machine-led work in real time,” said Stephen Elliot, group vice president, I&O, cloud operations and DevOps. “As CIOs and CEOs are continuing to find ways to increase operational efficiency, now is the time for people to get ahead of unplanned downtime events and find ways to automate their incident response processes.”

- New Rundeck Cloud: Rundeck is now available as a fully managed cloud service

With Rundeck, users focus on building and running automated workflows. Rundeck Cloud manages the infrastructure for users by providing high availability, security, and elastic scalability. It also manages all patches and updates, so users always have the latest features available. In the near future, Rundeck Actions will pair with Rundeck Cloud to quickly create sophisticated automated diagnostics and remediation for your production systems.

- New Service Standards: Enables account owners to configure and enforce best practice standards at scale for all their managed services.

Clearly defined and well-configured services are central to achieving team autonomy and efficient incident response. Many organizations are pivoting towards a Service Ownership model where developers and site reliability engineers take responsibility for supporting the code they deliver at every stage of the service lifecycle: they build it, ship it, and own it in production. PagerDuty’s Service Standards empower organizations to easily define, share, and track the criteria for service configuration according to their unique needs. Individual teams receive clear guidelines for setting up and managing services within PagerDuty.

- New Change Events & Change Correlation for Mobile: Help responders solve incidents faster

Deliver machine-learning-powered change directly to on-call responders, now on mobile devices. With the latest context available at a glance and on mobile devices, responders can identify potential change correlation, triage incidents quickly, and reduce time-to-resolution while on the go.

Solutions Now Generally Available:

- Rundeck Actions + Automated Diagnostics Package: Empowers responders to immediately remove critical minutes from incident response

- PagerDuty Rundeck Actions help users take action to run automated diagnostics and remediate incidents directly within PagerDuty. Improve productivity by automating repeated diagnostic and remediation steps, replacing toil of manual tasks.

Share this

The Latest

October 05, 2022

IT operations is a metrics-driven function and teams should keep score as a core practice. Services and sub-services break, alerts of varying quality come in, incidents are created, and services get fixed. Analytics can help IT teams improve these operations ...

October 04, 2022

Big Data makes it possible to bring data from all the monitoring and reporting tools together, both for more effective analysis and a simplified single-pane view for the user. IT teams gain a holistic picture of system performance. Doing this makes sense because the system's components interact, and issues in one area affect another ...

October 03, 2022

IT engineers and executives are responsible for system reliability and availability. The volume of data can make it hard to be proactive and fix issues quickly. With over a decade of experience in the field, I know the importance of IT operations analytics and how it can help identify incidents and enable agile responses ...

September 30, 2022

For businesses with vast and distributed computing infrastructures, one of the main objectives of IT and network operations is to locate the cause of a service condition that is having an impact. The more human resources are put into the task of gathering, processing, and finally visual monitoring the massive volumes of event and log data that serve as the main source of symptomatic indications for emerging crises, the closer the service is to the company's source of revenue ...

September 29, 2022

Our digital economy is intolerant of downtime. But consumers haven't just come to expect always-on digital apps and services. They also expect continuous innovation, new functionality and lightening fast response times. Organizations have taken note, investing heavily in teams and tools that supposedly increase uptime and free resources for innovation. But leaders have not realized this "throw money at the problem" approach to monitoring is burning through resources without much improvement in availability outcomes ...

September 28, 2022

Although 83% of businesses are concerned about a recession in 2023, B2B tech marketers can look forward to growth — 51% of organizations plan to increase IT budgets in 2023 vs. a narrow 6% that plan to reduce their spend, according to the 2023 State of IT report from Spiceworks Ziff Davis ...

September 27, 2022

Users have high expectations around applications — quick loading times, look and feel visually advanced, with feature-rich content, video streaming, and multimedia capabilities — all of these devour network bandwidth. With millions of users accessing applications and mobile apps from multiple devices, most companies today generate seemingly unmanageable volumes of data and traffic on their networks ...

September 26, 2022

In Italy, it is customary to treat wine as part of the meal ... Too often, testing is treated with the same reverence as the post-meal task of loading the dishwasher, when it should be treated like an elegant wine pairing ...

September 23, 2022

In order to properly sort through all monitoring noise and identify true problems, their causes, and to prioritize them for response by the IT team, they have created and built a revolutionary new system using a meta-cognitive model ...

September 22, 2022

As we shift further into a digital-first world, where having a reliable online experience becomes more essential, Site Reliability Engineers remain in-demand among organizations of all sizes ... This diverse set of skills and values can be difficult to interview for. In this blog, we'll get you started with some example questions and processes to find your ideal SRE ...