As Digital Transformation Prevails, Automation Remains a Top Priority for DevOps, ITOps and SRE Teams
June 27, 2022

Jessica Abelson

Share this

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research.

In a new survey collected from 1,046 engineering, IT Operations, DevOps and site reliability engineering professionals in the United States with the role of VP, Director, Manager or individual contributor at organizations with over 300 employees, almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management.

The second annual State of DevOps Automation Report, commissioned by Transposit also revealed close to sixty percent of organizations are losing up to half a million dollars per hour to downtime, a critical issue that can be mitigated with better automation and collaboration.

Organizations Still Lack Full Integration of Incident Response Tools

With 90.2% of organizations reporting an increased focus on digital transformation over the past year, paired with the persistence of hybrid and remote work, almost three-quarters (73.4%) of operations teams have expanded their tech stack. However, when asked how well integrated the various tools used during incident response are, only one quarter (24.7%) said all of their tools are integrated through one tool or platform. This means the vast majority (75.3%) don’t have full integration, leaving teams at risk of slow issue detection and analysis and a decrease in overall quality of service reliability and customer experience.

Broader deployment of automation has led developers to recognize that it’s key to reducing downtime and increasing resolution. This was seen by 3 in 4 organizations that implemented a continuous workflow to incident response for service management after adopting a hybrid workforce model.

Manual Processes Are Outdated and Lead to Higher Cost of Downtime and Service Incident Volume

The survey also found that more than a third (39.7%) of organizations had an increased cost of downtime during the last year (March 2021 to now). In fact, 58.2% reported that downtime (i.e., application outages, service degradation) cost their organization up to $499,999 per hour on average. Of those who reported an increase in the amount of time it takes to resolve incidents, 45.2% said it was due to a lack of unified communication with teammates (people are collaborating using disparate tools).

"Organizations need to deliver innovation faster and more efficiently than ever before. However, too many SRE, ITOps and DevOps teams are wasting time on disconnected, manual processes and playing a reactive game of whack-a-mole as they try to keep applications running," said Divanny Lamas, CEO of Transposit.

Operations teams are experiencing challenges while trying to solve incidents, including difficulties reaching people with specialized knowledge, inadequate support from collaboration methods and tools and lack of automation. When asked if they have observed any change in the frequency of service incidents that have affected their customers over the course of the last year (March 2021 to now), 62.9% of respondents reported an increase. Of those who said there was an increase in service incidents, respondents said the top reasons why this happened are digital transformation (60.7%), rolling out of new products or product updates (55.1%), methods and tools for collaboration did not adequately support their remote team (49.3%) and organizational change including team member churn, influx of new team members, and M&A activity (45.4%).

The Key to Faster Resolution of Incidents and Less Downtime: SRE Practices Combined with Automation

The rising demand for site reliability engineering is clear, as 75.6% of respondents said there has been an increased focus on SRE practices in their organization in the past 12 months, and of those, 35.1% plan to expand SRE efforts in 2022. Additionally, 65.1% of respondents plan to hire site reliability engineers in the next 12 months.

The need for automation tools is evident in the SRE roles to complement organizations’ increased focus on site reliability practices; 42.3% of SREs said the current level of automation is not meeting their organization’s needs and they are actively pursuing a new solution to solve for this shortage.

SREs are still dealing with cumbersome and tedious processes, despite the increased demand for SRE practices. Over half of SREs (56.5%) reported they still manually enter data into an ITSM system or other system or record to keep track of actions that were taken by humans during the resolution of an incident.

To scale, organizations need to implement automation technology to rid teams of these time-consuming manual processes. This is underlined by the fact that a full 100% of the respondents with a VP/Director/Manager SRE title who cited a decrease or no change in service incidents said it was because their organization implemented automation technology to help reduce the number of service incidents. Respondents also said better documentation, process and availability of data during incidents would have the most impact on MTTR, downtime and quality of service reliability.

As seen in the survey, organizations' approaches to automation differ. A majority (63%) responded that their approach to automation was incremental automation, in which they begin by codifying processes and work up to more advanced, fully automated scenarios. When asked whether automation should let humans use their judgment at critical decision points to be more reliable and effective, 80.4% of respondents said yes. Automation that keeps humans in the loop at key decision points increases flexibility and stability while automating repetitive tasks.

The top three tasks respondents would like automated are: service requests (52.6%), change requests (42.9%) and user provisioning (39.8%). Organizations are seeing the need to double-down on automation — the top three ways organizations plan to improve their incident management process are to implement new automation tools or applications (48.2%), implement new communications/collaboration tools or applications (41.5%) and implement new integration tools or applications (40.6%).

The survey makes it clear that ITOps, DevOps and SRE professionals should consider enhancing service reliability through human-in-the-loop automation, SRE practices and better collaboration methods. Teams enabled with these tools and process advancements are better empowered to spend their time and efforts on delivering innovation and competitive advantages, and ultimately creating more business value.

Jessica Abelson is Director of Product Marketing at Transposit
Share this

The Latest

March 27, 2023

To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...

March 23, 2023

APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...

March 22, 2023

CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...

March 21, 2023

Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...

March 20, 2023

Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...

March 16, 2023

Managing availability and performance within SAP environments has long been a challenge for IT teams. But as IT environments grow more complex and dynamic, and the speed of innovation in almost every industry continues to accelerate, this situation is becoming a whole lot worse ...

March 15, 2023

Harnessing the power of network-derived intelligence and insights is critical in detecting today's increasingly sophisticated security threats across hybrid and multi-cloud infrastructure, according to a new research study from IDC ...

March 14, 2023

Recent research suggests that many organizations are paying for more software than they need. If organizations are looking to reduce IT spend, leaders should take a closer look at the tools being offered to employees, as not all software is essential ...

March 13, 2023

Organizations are challenged by tool sprawl and data source overload, according to the Grafana Labs Observability Survey 2023, with 52% of respondents reporting that their companies use 6 or more observability tools, including 11% that use 16 or more.

March 09, 2023

An array of tools purport to maintain availability — the trick is sorting through the noise to find the right one. Let us discuss why availability is so important and then unpack the ROI of deploying Artificial Intelligence for IT Operations (AIOps) during an economic downturn ...