Remote Work and Digital Transformation Exacerbate Challenges of Managing the Modern Stack
April 15, 2021

Ed Sawma
Transposit

Share this

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report.

More than half of respondents reported that the most common challenge while taking action to resolve an incident was a lack of automation. This influx of stressors means ITOps and software engineering teams — including DevOps and SREs — face increasing complexity in their work, leading to significantly more strain and application downtime unless preventive measures are taken.

Service Incidents and Remediation in a Pandemic-Influenced World

The vast majority of organizations surveyed adopted remote/hybrid work policies and augmented digital transformation initiatives since the start of the pandemic. At the same time many have also been hampered by longer incident resolution, inefficient processes, and lack of automation.

9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic

The acceleration in digital transformation has resulted in an uptick in service incidents, putting a heavier burden on DevOps, SRE, and IT teams. The survey found that 9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic, with nearly 60% of respondents observing at least a 20% increase in service incidents or more. Most (93%) said incidents were taking longer to resolve while working remotely and nearly 70% saw an increase in the cost of downtime since the pandemic began.

The survey results indicate these findings stem from a number of variables. First, most organizations still rely on manual, repetitive DevOps processes that cause unnecessary toil.

They're also investing precious resources on building custom in-house tools — which burdens all parts of the software stack — when those resources could instead be used on product innovation or customer service initiatives.

Still, organizations are motivated to get the right tools, processes, and reliable automation in place to keep pace with innovation and decrease mean time to resolution (MTTR). The majority of respondents believed that systematically mining insights from human data (such as archived Slack communications, postmortem interviews, group feedback, etc.) could improve both future incident response and fuel operational excellence.

The Growing Popularity of Site Reliability Engineering

SREs are essential to any organization for solving infrastructure and operational problems — and they're going mainstream. In fact, an overwhelming 94% of respondents increased focus on SRE practices in their organization in the past 12 months and 86% of organizations are planning to hire SREs in the next 12 months. While these numbers are high, they're not surprising when considering how engineering and operations teams are being stretched to the limit. Investments in automation are a natural reaction to these circumstances.

Even if organizations do not have formal SRE roles, ITOps teams are adopting SRE practices. Almost all (98%) of respondents with the "VP/Director/Manager IT Operations" role increased focus on SRE practices in their organization in the past 12 months, while 62.4% of IT Operations respondents plan to expand SRE efforts in 2021.

SREs are critical contributors to incident resolution and help teams work with complex distributed systems at scale. However, nearly 80% of respondents said individuals responsible for reliability engineering are experiencing challenges while trying to solve incidents as they are occurring.

Automation Drivers and Barriers

A key takeaway from the study is that automation is a highly valuable tool for engineering operations. Although the benefits of automation are known, nearly half of respondents reported that their engineering operations are only 26-50% automated. Half (51.9%) cited inadequate documentation of institutional knowledge and existing processes as a barrier, followed by lack of clarity about what to automate (47.3%) and the gaps in share of knowledge (43.8%).

While organizations are still draining resources, time, and money on manual tasks while responding to incidents, they're aware something needs to change. This is evidenced by the 40% of organizations who have one or more full time engineers working on custom in-house tools or bots for automating incident response.

Most commercially available automation solutions use the "automate everything" approach and do not incorporate human-in-the-loop automation, which helps explain this finding. And humans aren't going anywhere: the research revealed that 9 out of 10 respondents believe automation should let humans use their judgment at critical decision points to be more reliable and effective.

One simple yet effective beachhead for moving automation forward is documentation. The marriage of automated process documentation that keeps humans in the loop and availability of actionable data on how to operate systems during and in between incidents can improve (MTTR), enhance service reliability, streamline operations, and lower the cost of downtime.

Ed Sawma is VP of Marketing at Transposit
Share this

The Latest

February 06, 2023

This year 2023, at a macro level we are moving from an inflation economy to a recession and uncertain economy and the general theme is certainly going to be "Doing More with Less" and "Customer Experience is the King." Let us examine what trends and technologies will play a lending hand in these circumstances ...

February 02, 2023

As organizations continue to adapt to a post-pandemic surge in cloud-based productivity, the 2023 State of the Network report from Viavi Solutions details how end-user awareness remains critical and explores the benefits — and challenges — of cloud and off-premises network modernization initiatives ...

February 01, 2023

In the network engineering world, many teams have yet to realize the immense benefit real-time collaboration tools can bring to a successful automation strategy. By integrating a collaboration platform into a network automation strategy — and taking advantage of being able to share responses, files, videos and even links to applications and device statuses — network teams can leverage these tools to manage, monitor and update their networks in real time, and improve the ways in which they manage their networks ...

January 31, 2023

A recent study revealed only an alarming 5% of IT decision makers who report having complete visibility into employee adoption and usage of company-issued applications, demonstrating they are often unknowingly careless when it comes to software investments that can ultimately be costly in terms of time and resources ...

January 30, 2023

Everyone has visibility into their multi-cloud networking environment, but only some are happy with what they see. Unfortunately, this continues a trend. According to EMA's latest research, most network teams have some end-to-end visibility across their multi-cloud networks. Still, only 23.6% are fully satisfied with their multi-cloud network monitoring and troubleshooting capabilities ...

January 26, 2023

As enterprises work to implement or improve their observability practices, tool sprawl is a very real phenomenon ... Tool sprawl can and does happen all across the organization. In this post, though, we'll focus specifically on how and why observability efforts often result in tool sprawl, some of the possible negative consequences of that sprawl, and we'll offer some advice on how to reduce or even avoid sprawl ...

January 25, 2023

As companies generate more data across their network footprints, they need network observability tools to help find meaning in that data for better decision-making and problem solving. It seems many companies believe that adding more tools leads to better and faster insights ... And yet, observability tools aren't meeting many companies' needs. In fact, adding more tools introduces new challenges ...

January 24, 2023

Driven by the need to create scalable, faster, and more agile systems, businesses are adopting cloud native approaches. But cloud native environments also come with an explosion of data and complexity that makes it harder for businesses to detect and remediate issues before everything comes to a screeching halt. Observability, if done right, can make it easier to mitigate these challenges and remediate incidents before they become major customer-impacting problems ...

January 23, 2023

The spiraling cost of energy is forcing public cloud providers to raise their prices significantly. A recent report by Canalys predicted that public cloud prices will jump by around 20% in the US and more than 30% in Europe in 2023. These steep price increases will test the conventional wisdom that moving to the cloud is a cheap computing alternative ...

January 19, 2023

Despite strong interest over the past decade, the actual investment in DX has been recent. While 100% of enterprises are now engaged with DX in some way, most (77%) have begun their DX journey within the past two years. And most are early stage, with a fourth (24%) at the discussion stage and half (49%) currently transforming. Only 27% say they have finished their DX efforts ...