Skip to main content

Remote Work and Digital Transformation Exacerbate Challenges of Managing the Modern Stack

Ed Sawma
Transposit

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report.

More than half of respondents reported that the most common challenge while taking action to resolve an incident was a lack of automation. This influx of stressors means ITOps and software engineering teams — including DevOps and SREs — face increasing complexity in their work, leading to significantly more strain and application downtime unless preventive measures are taken.

Service Incidents and Remediation in a Pandemic-Influenced World

The vast majority of organizations surveyed adopted remote/hybrid work policies and augmented digital transformation initiatives since the start of the pandemic. At the same time many have also been hampered by longer incident resolution, inefficient processes, and lack of automation.

9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic

The acceleration in digital transformation has resulted in an uptick in service incidents, putting a heavier burden on DevOps, SRE, and IT teams. The survey found that 9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic, with nearly 60% of respondents observing at least a 20% increase in service incidents or more. Most (93%) said incidents were taking longer to resolve while working remotely and nearly 70% saw an increase in the cost of downtime since the pandemic began.

The survey results indicate these findings stem from a number of variables. First, most organizations still rely on manual, repetitive DevOps processes that cause unnecessary toil.

They're also investing precious resources on building custom in-house tools — which burdens all parts of the software stack — when those resources could instead be used on product innovation or customer service initiatives.

Still, organizations are motivated to get the right tools, processes, and reliable automation in place to keep pace with innovation and decrease mean time to resolution (MTTR). The majority of respondents believed that systematically mining insights from human data (such as archived Slack communications, postmortem interviews, group feedback, etc.) could improve both future incident response and fuel operational excellence.

The Growing Popularity of Site Reliability Engineering

SREs are essential to any organization for solving infrastructure and operational problems — and they're going mainstream. In fact, an overwhelming 94% of respondents increased focus on SRE practices in their organization in the past 12 months and 86% of organizations are planning to hire SREs in the next 12 months. While these numbers are high, they're not surprising when considering how engineering and operations teams are being stretched to the limit. Investments in automation are a natural reaction to these circumstances.

Even if organizations do not have formal SRE roles, ITOps teams are adopting SRE practices. Almost all (98%) of respondents with the "VP/Director/Manager IT Operations" role increased focus on SRE practices in their organization in the past 12 months, while 62.4% of IT Operations respondents plan to expand SRE efforts in 2021.

SREs are critical contributors to incident resolution and help teams work with complex distributed systems at scale. However, nearly 80% of respondents said individuals responsible for reliability engineering are experiencing challenges while trying to solve incidents as they are occurring.

Automation Drivers and Barriers

A key takeaway from the study is that automation is a highly valuable tool for engineering operations. Although the benefits of automation are known, nearly half of respondents reported that their engineering operations are only 26-50% automated. Half (51.9%) cited inadequate documentation of institutional knowledge and existing processes as a barrier, followed by lack of clarity about what to automate (47.3%) and the gaps in share of knowledge (43.8%).

While organizations are still draining resources, time, and money on manual tasks while responding to incidents, they're aware something needs to change. This is evidenced by the 40% of organizations who have one or more full time engineers working on custom in-house tools or bots for automating incident response.

Most commercially available automation solutions use the "automate everything" approach and do not incorporate human-in-the-loop automation, which helps explain this finding. And humans aren't going anywhere: the research revealed that 9 out of 10 respondents believe automation should let humans use their judgment at critical decision points to be more reliable and effective.

One simple yet effective beachhead for moving automation forward is documentation. The marriage of automated process documentation that keeps humans in the loop and availability of actionable data on how to operate systems during and in between incidents can improve (MTTR), enhance service reliability, streamline operations, and lower the cost of downtime.

Ed Sawma is VP of Marketing at Transposit

Hot Topics

The Latest

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

Remote Work and Digital Transformation Exacerbate Challenges of Managing the Modern Stack

Ed Sawma
Transposit

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report.

More than half of respondents reported that the most common challenge while taking action to resolve an incident was a lack of automation. This influx of stressors means ITOps and software engineering teams — including DevOps and SREs — face increasing complexity in their work, leading to significantly more strain and application downtime unless preventive measures are taken.

Service Incidents and Remediation in a Pandemic-Influenced World

The vast majority of organizations surveyed adopted remote/hybrid work policies and augmented digital transformation initiatives since the start of the pandemic. At the same time many have also been hampered by longer incident resolution, inefficient processes, and lack of automation.

9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic

The acceleration in digital transformation has resulted in an uptick in service incidents, putting a heavier burden on DevOps, SRE, and IT teams. The survey found that 9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic, with nearly 60% of respondents observing at least a 20% increase in service incidents or more. Most (93%) said incidents were taking longer to resolve while working remotely and nearly 70% saw an increase in the cost of downtime since the pandemic began.

The survey results indicate these findings stem from a number of variables. First, most organizations still rely on manual, repetitive DevOps processes that cause unnecessary toil.

They're also investing precious resources on building custom in-house tools — which burdens all parts of the software stack — when those resources could instead be used on product innovation or customer service initiatives.

Still, organizations are motivated to get the right tools, processes, and reliable automation in place to keep pace with innovation and decrease mean time to resolution (MTTR). The majority of respondents believed that systematically mining insights from human data (such as archived Slack communications, postmortem interviews, group feedback, etc.) could improve both future incident response and fuel operational excellence.

The Growing Popularity of Site Reliability Engineering

SREs are essential to any organization for solving infrastructure and operational problems — and they're going mainstream. In fact, an overwhelming 94% of respondents increased focus on SRE practices in their organization in the past 12 months and 86% of organizations are planning to hire SREs in the next 12 months. While these numbers are high, they're not surprising when considering how engineering and operations teams are being stretched to the limit. Investments in automation are a natural reaction to these circumstances.

Even if organizations do not have formal SRE roles, ITOps teams are adopting SRE practices. Almost all (98%) of respondents with the "VP/Director/Manager IT Operations" role increased focus on SRE practices in their organization in the past 12 months, while 62.4% of IT Operations respondents plan to expand SRE efforts in 2021.

SREs are critical contributors to incident resolution and help teams work with complex distributed systems at scale. However, nearly 80% of respondents said individuals responsible for reliability engineering are experiencing challenges while trying to solve incidents as they are occurring.

Automation Drivers and Barriers

A key takeaway from the study is that automation is a highly valuable tool for engineering operations. Although the benefits of automation are known, nearly half of respondents reported that their engineering operations are only 26-50% automated. Half (51.9%) cited inadequate documentation of institutional knowledge and existing processes as a barrier, followed by lack of clarity about what to automate (47.3%) and the gaps in share of knowledge (43.8%).

While organizations are still draining resources, time, and money on manual tasks while responding to incidents, they're aware something needs to change. This is evidenced by the 40% of organizations who have one or more full time engineers working on custom in-house tools or bots for automating incident response.

Most commercially available automation solutions use the "automate everything" approach and do not incorporate human-in-the-loop automation, which helps explain this finding. And humans aren't going anywhere: the research revealed that 9 out of 10 respondents believe automation should let humans use their judgment at critical decision points to be more reliable and effective.

One simple yet effective beachhead for moving automation forward is documentation. The marriage of automated process documentation that keeps humans in the loop and availability of actionable data on how to operate systems during and in between incidents can improve (MTTR), enhance service reliability, streamline operations, and lower the cost of downtime.

Ed Sawma is VP of Marketing at Transposit

Hot Topics

The Latest

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...