Remote Work and Digital Transformation Exacerbate Challenges of Managing the Modern Stack
April 15, 2021

Ed Sawma
Transposit

Share this

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report.

More than half of respondents reported that the most common challenge while taking action to resolve an incident was a lack of automation. This influx of stressors means ITOps and software engineering teams — including DevOps and SREs — face increasing complexity in their work, leading to significantly more strain and application downtime unless preventive measures are taken.

Service Incidents and Remediation in a Pandemic-Influenced World

The vast majority of organizations surveyed adopted remote/hybrid work policies and augmented digital transformation initiatives since the start of the pandemic. At the same time many have also been hampered by longer incident resolution, inefficient processes, and lack of automation.

9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic

The acceleration in digital transformation has resulted in an uptick in service incidents, putting a heavier burden on DevOps, SRE, and IT teams. The survey found that 9 out of 10 organizations experienced an increase in service incidents that have affected their customers since the start of the pandemic, with nearly 60% of respondents observing at least a 20% increase in service incidents or more. Most (93%) said incidents were taking longer to resolve while working remotely and nearly 70% saw an increase in the cost of downtime since the pandemic began.

The survey results indicate these findings stem from a number of variables. First, most organizations still rely on manual, repetitive DevOps processes that cause unnecessary toil.

They're also investing precious resources on building custom in-house tools — which burdens all parts of the software stack — when those resources could instead be used on product innovation or customer service initiatives.

Still, organizations are motivated to get the right tools, processes, and reliable automation in place to keep pace with innovation and decrease mean time to resolution (MTTR). The majority of respondents believed that systematically mining insights from human data (such as archived Slack communications, postmortem interviews, group feedback, etc.) could improve both future incident response and fuel operational excellence.

The Growing Popularity of Site Reliability Engineering

SREs are essential to any organization for solving infrastructure and operational problems — and they're going mainstream. In fact, an overwhelming 94% of respondents increased focus on SRE practices in their organization in the past 12 months and 86% of organizations are planning to hire SREs in the next 12 months. While these numbers are high, they're not surprising when considering how engineering and operations teams are being stretched to the limit. Investments in automation are a natural reaction to these circumstances.

Even if organizations do not have formal SRE roles, ITOps teams are adopting SRE practices. Almost all (98%) of respondents with the "VP/Director/Manager IT Operations" role increased focus on SRE practices in their organization in the past 12 months, while 62.4% of IT Operations respondents plan to expand SRE efforts in 2021.

SREs are critical contributors to incident resolution and help teams work with complex distributed systems at scale. However, nearly 80% of respondents said individuals responsible for reliability engineering are experiencing challenges while trying to solve incidents as they are occurring.

Automation Drivers and Barriers

A key takeaway from the study is that automation is a highly valuable tool for engineering operations. Although the benefits of automation are known, nearly half of respondents reported that their engineering operations are only 26-50% automated. Half (51.9%) cited inadequate documentation of institutional knowledge and existing processes as a barrier, followed by lack of clarity about what to automate (47.3%) and the gaps in share of knowledge (43.8%).

While organizations are still draining resources, time, and money on manual tasks while responding to incidents, they're aware something needs to change. This is evidenced by the 40% of organizations who have one or more full time engineers working on custom in-house tools or bots for automating incident response.

Most commercially available automation solutions use the "automate everything" approach and do not incorporate human-in-the-loop automation, which helps explain this finding. And humans aren't going anywhere: the research revealed that 9 out of 10 respondents believe automation should let humans use their judgment at critical decision points to be more reliable and effective.

One simple yet effective beachhead for moving automation forward is documentation. The marriage of automated process documentation that keeps humans in the loop and availability of actionable data on how to operate systems during and in between incidents can improve (MTTR), enhance service reliability, streamline operations, and lower the cost of downtime.

Ed Sawma is VP of Marketing at Transposit
Share this

The Latest

September 16, 2021

Achieve more with less. How many of you feel that pressure — or, even worse, hear those words — trickle down from leadership? The reality is that overworked and under-resourced IT departments will only lead to chronic errors, missed deadlines and service assurance failures. After all, we're only human. So what are overburdened IT departments to do? Reduce the human factor. In a word: automate ...

September 15, 2021

On average, data innovators release twice as many products and increase employee productivity at double the rate of organizations with less mature data strategies, according to the State of Data Innovation report from Splunk ...

September 14, 2021

While 90% of respondents believe observability is important and strategic to their business — and 94% believe it to be strategic to their role — just 26% noted mature observability practices within their business, according to the 2021 Observability Forecast ...

September 13, 2021

Let's explore a few of the most prominent app success indicators and how app engineers can shift their development strategy to better meet the needs of today's app users ...

September 09, 2021

Business enterprises aiming at digital transformation or IT companies developing new software applications face challenges in developing eye-catching, robust, fast-loading, mobile-friendly, content-rich, and user-friendly software. However, with increased pressure to reduce costs and save time, business enterprises often give a short shrift to performance testing services ...

September 08, 2021

DevOps, SRE and other operations teams use observability solutions with AIOps to ingest and normalize data to get visibility into tech stacks from a centralized system, reduce noise and understand the data's context for quicker mean time to recovery (MTTR). With AI using these processes to produce actionable insights, teams are free to spend more time innovating and providing superior service assurance. Let's explore AI's role in ingestion and normalization, and then dive into correlation and deduplication too ...

September 07, 2021

As we look into the future direction of observability, we are paying attention to the rise of artificial intelligence, machine learning, security, and more. I asked top industry experts — DevOps Institute Ambassadors — to offer their predictions for the future of observability. The following are 10 predictions ...

September 01, 2021

One thing is certain: The hybrid workplace, a term we helped define in early 2020, with its human-centric work design, is the future. However, this new hybrid work flexibility does not come without its costs. According to Microsoft ... weekly meeting times for MS Teams users increased 148%, between February 2020 and February 2021 they saw a 40 billion increase in the number of emails, weekly per person team chats is up 45% (and climbing), and people working on Office Docs increased by 66%. This speaks to the need to further optimize remote interactions to avoid burnout ...

August 31, 2021

Here's how it happens: You're deploying a new technology, thinking everything's going smoothly, when the alerts start coming in. Your rollout has hit a snag. Whole groups of users are complaining about poor performance on their devices. Some can't access applications at all. You've now blown your service-level agreement (SLA). You might have just introduced a new security vulnerability. In the worst case, your big expensive product launch has missed the mark altogether. "How did this happen?" you're asking yourself. "Didn't we test everything before we deployed?" ...

August 30, 2021

The Fastly outage in June 2021 showed how one inconspicuous coding error can cause worldwide chaos. A single Fastly customer making a legitimate configuration change, triggered a hidden bug that sent half of the internet offline, including web giants like Amazon and Reddit. Ultimately, this incident illustrates why organizations must test their software in production ...