Skip to main content

3 Tips for Flexible, Adaptive Incident Management

Emily Arnott
Blameless

Incidents should be your best friend. It sounds like a controversial statement. It sounds like a lot of unnecessary work. The truth is, for companies engaged in delivering any online or digital experience, taking this point of view is absolutely E-S-S-E-N-T-I-A-L. Apart from the cost of an outage in production, unplanned work created by incidents will begin to hamper feature velocity if you don't approach addressing them in the right way and there's no faster way to damage your customer relationships than recurring product outages.

Whether we like it or not, responding effectively to unexpected incidents is central to modern IT Operations. Having an integrated, evolving approach to managing incidents can unlock the agility and velocity of a DevOps team and can improve the overall quality of the software they're developing. A rigid, dogmatic approach can leave that same team mired in tech debt and struggling to stay above water.

The key is in viewing incidents as an opportunity to learn something new about your product and your process. If delivering a reliable product that customers will love is your goal, then how you build and operate the product is just as important as what you build. Having the right structure and process can help your engineering team stay aligned at scale. Good incident management practices can be a mechanism for interrogating the effectiveness of that structure. That's true for companies embracing ITIL, DevOps or SRE.

Developing a strong incident response process is key to minimizing downtime and learning from each incident. This takes time, practice and the right tooling. So to help you get started, we've got 3 tips for creating a more flexible, adaptive framework for incident management.

1. Where You Manage Incidents Matters

There is no shortage of software solutions that claim to support incident management. That should be no surprise, managing incidents involves a complex set of tasks that include monitoring, alerting, and paging. However, to really be effective at managing incidents, a command center of sorts is needed to organize the people responsible for achieving resolution. There is no better place to locate that command center, than in the team's preferred chat bot. These offer unparalleled flexibility to recruit and coordinate the right experts. This is where targeted incident management solutions begin to separate themselves from more generic IT solutions like ITIL software.

"Incident Management solutions help DevOps or SRE teams create consistent incident workflows that map to their unique needs. Those workflows can then be easily activated within their chat system and can have wide cascading effects across multiple other systems once they're activated" says Kurt Andersen, SRE architect at Blameless.

2. Never forget "Communication is key"

"The worst case scenario for many SRE leaders is a large Sev0 incident with multiple customers impacted. CEO, VPs, and CS are all reporting customer issues and asking for status updates, while it looks like there are no engineers building or executing a plan to restore service. Then the scenario repeats the next day," says Aaron Bento, Principal SRE for Arkose Labs

When an engineering incident is underway, ensuring stakeholder communication is the most important responsibility of an incident commander, next to resolving the incident itself. They can handle the communications themselves or delegate to a communications lead. This may sound simple but it's anything but. Large organizations are likely to have a diverse set of stakeholders who need to be informed, not the least important of which are their customers.

"Having too many cooks in the kitchen can cripple your incident response. That's why it's so important to communicate effectively, to the right stakeholders throughout the incident" says Vincent Rivellino, Head of Reliability and Developer Platforms at Mission Lane.

"Also, If customers are impacted there can be a serious hit to your company's reputation. We lean into IM even for incidents where we're not breaking technology SLAs. We often need swift incident resolution followed by coordinated execution of customer remediation. For us that often involves non-technical stakeholders who are communicating with our customers. At the end of the day, the most important thing is our customers know we have their back."

Whether managing internal stakeholder communications or communicating with customers, having clearly defined expectations for update cadences and automated reminders to follow up is really helpful. These are unique capabilities of modern incident management tools like Blameless that alternatives don't provide.

3. Treat incidents as opportunities

"The benefit of a more mature incident management process is identifying where the hot spots are in your product and where you as an engineering leader need to invest your team's engineering hours or budget," says Elisa Binette, Director of Engineering and Site Reliability at VMWare.

If your team is interested in driving development velocity, it's not enough to try to eliminate toil from the incident response process. You need to go a step further and begin to leverage incidents proactively to identify points of weakness in your product and engineering process. This means running clear, effective retrospectives, tagging and capturing all the relevant incident data available and surfacing that back to the right stakeholders. Over time, this can help reduce the load on your entire team by making your process more efficient, your product more robust, and reducing the number of repeat incidents that your team has to manage.

"If you look at incidents as an opportunity to learn about what's weak or broken in your product, and commit the right resources to addressing those weaknesses, you can quickly begin to reduce the number of repeat incidents your team encounters. Says Aaron Bento, Principal SRE for Arkose Labs. "Repeat incidents can be a killer for morale because they're a sign that we're not identifying the source of our problem. Taking a more proactive approach to incident management can really make a big difference."

To maximize the value of the incident management process, your team needs opportunities to experiment, learn and iterate. With the right tooling and the right approach, you'll soon be turning disruptive incidents into valuable insights.

Emily Arnott is Community Relations Manager at Blameless

Hot Topics

The Latest

Payment system failures are putting $44.4 billion in US retail and hospitality sales at risk each year, underscoring how quickly disruption can derail day-to-day trading, according to research conducted by Dynatrace ... The findings show that payment failures are no longer isolated incidents, but part of a recurring operational challenge that disrupts service, damages customer trust, and negatively impacts revenue ...

For years, the success of DevOps has been measured by how much manual work teams can automate ... I believe that in 2026, the definition of DevOps success is going to expand significantly. The era of automation is giving way to the era of intelligent delivery, in which AI doesn't just accelerate pipelines, it understands them. With open observability connecting signals end-to-end across those tools, teams can build closed-loop systems that don't just move faster, but learn, adapt, and take action autonomously with confidence ...

The conversation around AI in the enterprise has officially shifted from "if" to "how fast." But according to the State of Network Operations 2026 report from Broadcom, most organizations are unknowingly building their AI strategies on sand. The data is clear: CIOs and network teams are putting the cart before the horse. AI cannot improve what the network cannot see, predict issues without historical context, automate processes that aren't standardized, or recommend fixes when the underlying telemetry is incomplete. If AI is the brain, then network observability is the nervous system that makes intelligent action possible ...

SolarWinds data shows that one in three DBAs are contemplating leaving their positions — a striking indicator of workforce pressure in this role. This is likely due to the technical and interpersonal frustrations plaguing today's DBAs. Hybrid IT environments provide widespread organizational benefits but also present growing complexity. Simultaneously, AI presents a paradox of benefits and pain points ...

Over the last year, we've seen enterprises stop treating AI as “special projects.” It is no longer confined to pilots or side experiments. AI is now embedded in production, shaping decisions, powering new business models, and changing how employees and customers experience work every day. So, the debate of "should we adopt AI" is settled. The real question is how quickly and how deeply it can be applied ...

In MEAN TIME TO INSIGHT Episode 20, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA presents his 2026 NetOps predictions ... 

Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...

My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...

APMdigest's Predictions Series continues with 2026 Data Center Predictions — industry experts offer predictions on how data centers will evolve and impact business in 2026 ...

APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026. Part 2 covers data and data platforms ...

3 Tips for Flexible, Adaptive Incident Management

Emily Arnott
Blameless

Incidents should be your best friend. It sounds like a controversial statement. It sounds like a lot of unnecessary work. The truth is, for companies engaged in delivering any online or digital experience, taking this point of view is absolutely E-S-S-E-N-T-I-A-L. Apart from the cost of an outage in production, unplanned work created by incidents will begin to hamper feature velocity if you don't approach addressing them in the right way and there's no faster way to damage your customer relationships than recurring product outages.

Whether we like it or not, responding effectively to unexpected incidents is central to modern IT Operations. Having an integrated, evolving approach to managing incidents can unlock the agility and velocity of a DevOps team and can improve the overall quality of the software they're developing. A rigid, dogmatic approach can leave that same team mired in tech debt and struggling to stay above water.

The key is in viewing incidents as an opportunity to learn something new about your product and your process. If delivering a reliable product that customers will love is your goal, then how you build and operate the product is just as important as what you build. Having the right structure and process can help your engineering team stay aligned at scale. Good incident management practices can be a mechanism for interrogating the effectiveness of that structure. That's true for companies embracing ITIL, DevOps or SRE.

Developing a strong incident response process is key to minimizing downtime and learning from each incident. This takes time, practice and the right tooling. So to help you get started, we've got 3 tips for creating a more flexible, adaptive framework for incident management.

1. Where You Manage Incidents Matters

There is no shortage of software solutions that claim to support incident management. That should be no surprise, managing incidents involves a complex set of tasks that include monitoring, alerting, and paging. However, to really be effective at managing incidents, a command center of sorts is needed to organize the people responsible for achieving resolution. There is no better place to locate that command center, than in the team's preferred chat bot. These offer unparalleled flexibility to recruit and coordinate the right experts. This is where targeted incident management solutions begin to separate themselves from more generic IT solutions like ITIL software.

"Incident Management solutions help DevOps or SRE teams create consistent incident workflows that map to their unique needs. Those workflows can then be easily activated within their chat system and can have wide cascading effects across multiple other systems once they're activated" says Kurt Andersen, SRE architect at Blameless.

2. Never forget "Communication is key"

"The worst case scenario for many SRE leaders is a large Sev0 incident with multiple customers impacted. CEO, VPs, and CS are all reporting customer issues and asking for status updates, while it looks like there are no engineers building or executing a plan to restore service. Then the scenario repeats the next day," says Aaron Bento, Principal SRE for Arkose Labs

When an engineering incident is underway, ensuring stakeholder communication is the most important responsibility of an incident commander, next to resolving the incident itself. They can handle the communications themselves or delegate to a communications lead. This may sound simple but it's anything but. Large organizations are likely to have a diverse set of stakeholders who need to be informed, not the least important of which are their customers.

"Having too many cooks in the kitchen can cripple your incident response. That's why it's so important to communicate effectively, to the right stakeholders throughout the incident" says Vincent Rivellino, Head of Reliability and Developer Platforms at Mission Lane.

"Also, If customers are impacted there can be a serious hit to your company's reputation. We lean into IM even for incidents where we're not breaking technology SLAs. We often need swift incident resolution followed by coordinated execution of customer remediation. For us that often involves non-technical stakeholders who are communicating with our customers. At the end of the day, the most important thing is our customers know we have their back."

Whether managing internal stakeholder communications or communicating with customers, having clearly defined expectations for update cadences and automated reminders to follow up is really helpful. These are unique capabilities of modern incident management tools like Blameless that alternatives don't provide.

3. Treat incidents as opportunities

"The benefit of a more mature incident management process is identifying where the hot spots are in your product and where you as an engineering leader need to invest your team's engineering hours or budget," says Elisa Binette, Director of Engineering and Site Reliability at VMWare.

If your team is interested in driving development velocity, it's not enough to try to eliminate toil from the incident response process. You need to go a step further and begin to leverage incidents proactively to identify points of weakness in your product and engineering process. This means running clear, effective retrospectives, tagging and capturing all the relevant incident data available and surfacing that back to the right stakeholders. Over time, this can help reduce the load on your entire team by making your process more efficient, your product more robust, and reducing the number of repeat incidents that your team has to manage.

"If you look at incidents as an opportunity to learn about what's weak or broken in your product, and commit the right resources to addressing those weaknesses, you can quickly begin to reduce the number of repeat incidents your team encounters. Says Aaron Bento, Principal SRE for Arkose Labs. "Repeat incidents can be a killer for morale because they're a sign that we're not identifying the source of our problem. Taking a more proactive approach to incident management can really make a big difference."

To maximize the value of the incident management process, your team needs opportunities to experiment, learn and iterate. With the right tooling and the right approach, you'll soon be turning disruptive incidents into valuable insights.

Emily Arnott is Community Relations Manager at Blameless

Hot Topics

The Latest

Payment system failures are putting $44.4 billion in US retail and hospitality sales at risk each year, underscoring how quickly disruption can derail day-to-day trading, according to research conducted by Dynatrace ... The findings show that payment failures are no longer isolated incidents, but part of a recurring operational challenge that disrupts service, damages customer trust, and negatively impacts revenue ...

For years, the success of DevOps has been measured by how much manual work teams can automate ... I believe that in 2026, the definition of DevOps success is going to expand significantly. The era of automation is giving way to the era of intelligent delivery, in which AI doesn't just accelerate pipelines, it understands them. With open observability connecting signals end-to-end across those tools, teams can build closed-loop systems that don't just move faster, but learn, adapt, and take action autonomously with confidence ...

The conversation around AI in the enterprise has officially shifted from "if" to "how fast." But according to the State of Network Operations 2026 report from Broadcom, most organizations are unknowingly building their AI strategies on sand. The data is clear: CIOs and network teams are putting the cart before the horse. AI cannot improve what the network cannot see, predict issues without historical context, automate processes that aren't standardized, or recommend fixes when the underlying telemetry is incomplete. If AI is the brain, then network observability is the nervous system that makes intelligent action possible ...

SolarWinds data shows that one in three DBAs are contemplating leaving their positions — a striking indicator of workforce pressure in this role. This is likely due to the technical and interpersonal frustrations plaguing today's DBAs. Hybrid IT environments provide widespread organizational benefits but also present growing complexity. Simultaneously, AI presents a paradox of benefits and pain points ...

Over the last year, we've seen enterprises stop treating AI as “special projects.” It is no longer confined to pilots or side experiments. AI is now embedded in production, shaping decisions, powering new business models, and changing how employees and customers experience work every day. So, the debate of "should we adopt AI" is settled. The real question is how quickly and how deeply it can be applied ...

In MEAN TIME TO INSIGHT Episode 20, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA presents his 2026 NetOps predictions ... 

Today, technology buyers don't suffer from a lack of information but an abundance of it. They need a trusted partner to help them navigate this information environment ...

My latest title for O'Reilly, The Rise of Logical Data Management, was an eye-opener for me. I'd never heard of "logical data management," even though it's been around for several years, but it makes some extraordinary promises, like the ability to manage data without having to first move it into a consolidated repository, which changes everything. Now, with the demands of AI and other modern use cases, logical data management is on the rise, so it's "new" to many. Here, I'd like to introduce you to it and explain how it works ...

APMdigest's Predictions Series continues with 2026 Data Center Predictions — industry experts offer predictions on how data centers will evolve and impact business in 2026 ...

APMdigest's Predictions Series continues with 2026 DataOps Predictions — industry experts offer predictions on how DataOps and related technologies will evolve and impact business in 2026. Part 2 covers data and data platforms ...