Skip to main content

As Digital Transformation Prevails, Automation Remains a Top Priority for DevOps, ITOps and SRE Teams

Jessica Abelson
Transposit

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research.

In a new survey collected from 1,046 engineering, IT Operations, DevOps and site reliability engineering professionals in the United States with the role of VP, Director, Manager or individual contributor at organizations with over 300 employees, almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management.

The second annual State of DevOps Automation Report, commissioned by Transposit also revealed close to sixty percent of organizations are losing up to half a million dollars per hour to downtime, a critical issue that can be mitigated with better automation and collaboration.

Organizations Still Lack Full Integration of Incident Response Tools

With 90.2% of organizations reporting an increased focus on digital transformation over the past year, paired with the persistence of hybrid and remote work, almost three-quarters (73.4%) of operations teams have expanded their tech stack. However, when asked how well integrated the various tools used during incident response are, only one quarter (24.7%) said all of their tools are integrated through one tool or platform. This means the vast majority (75.3%) don’t have full integration, leaving teams at risk of slow issue detection and analysis and a decrease in overall quality of service reliability and customer experience.

Broader deployment of automation has led developers to recognize that it’s key to reducing downtime and increasing resolution. This was seen by 3 in 4 organizations that implemented a continuous workflow to incident response for service management after adopting a hybrid workforce model.

Manual Processes Are Outdated and Lead to Higher Cost of Downtime and Service Incident Volume

The survey also found that more than a third (39.7%) of organizations had an increased cost of downtime during the last year (March 2021 to now). In fact, 58.2% reported that downtime (i.e., application outages, service degradation) cost their organization up to $499,999 per hour on average. Of those who reported an increase in the amount of time it takes to resolve incidents, 45.2% said it was due to a lack of unified communication with teammates (people are collaborating using disparate tools).


"Organizations need to deliver innovation faster and more efficiently than ever before. However, too many SRE, ITOps and DevOps teams are wasting time on disconnected, manual processes and playing a reactive game of whack-a-mole as they try to keep applications running," said Divanny Lamas, CEO of Transposit.

Operations teams are experiencing challenges while trying to solve incidents, including difficulties reaching people with specialized knowledge, inadequate support from collaboration methods and tools and lack of automation. When asked if they have observed any change in the frequency of service incidents that have affected their customers over the course of the last year (March 2021 to now), 62.9% of respondents reported an increase. Of those who said there was an increase in service incidents, respondents said the top reasons why this happened are digital transformation (60.7%), rolling out of new products or product updates (55.1%), methods and tools for collaboration did not adequately support their remote team (49.3%) and organizational change including team member churn, influx of new team members, and M&A activity (45.4%).

The Key to Faster Resolution of Incidents and Less Downtime: SRE Practices Combined with Automation

The rising demand for site reliability engineering is clear, as 75.6% of respondents said there has been an increased focus on SRE practices in their organization in the past 12 months, and of those, 35.1% plan to expand SRE efforts in 2022. Additionally, 65.1% of respondents plan to hire site reliability engineers in the next 12 months.

The need for automation tools is evident in the SRE roles to complement organizations’ increased focus on site reliability practices; 42.3% of SREs said the current level of automation is not meeting their organization’s needs and they are actively pursuing a new solution to solve for this shortage.

SREs are still dealing with cumbersome and tedious processes, despite the increased demand for SRE practices. Over half of SREs (56.5%) reported they still manually enter data into an ITSM system or other system or record to keep track of actions that were taken by humans during the resolution of an incident.

To scale, organizations need to implement automation technology to rid teams of these time-consuming manual processes. This is underlined by the fact that a full 100% of the respondents with a VP/Director/Manager SRE title who cited a decrease or no change in service incidents said it was because their organization implemented automation technology to help reduce the number of service incidents. Respondents also said better documentation, process and availability of data during incidents would have the most impact on MTTR, downtime and quality of service reliability.

As seen in the survey, organizations' approaches to automation differ. A majority (63%) responded that their approach to automation was incremental automation, in which they begin by codifying processes and work up to more advanced, fully automated scenarios. When asked whether automation should let humans use their judgment at critical decision points to be more reliable and effective, 80.4% of respondents said yes. Automation that keeps humans in the loop at key decision points increases flexibility and stability while automating repetitive tasks.

The top three tasks respondents would like automated are: service requests (52.6%), change requests (42.9%) and user provisioning (39.8%). Organizations are seeing the need to double-down on automation — the top three ways organizations plan to improve their incident management process are to implement new automation tools or applications (48.2%), implement new communications/collaboration tools or applications (41.5%) and implement new integration tools or applications (40.6%).

The survey makes it clear that ITOps, DevOps and SRE professionals should consider enhancing service reliability through human-in-the-loop automation, SRE practices and better collaboration methods. Teams enabled with these tools and process advancements are better empowered to spend their time and efforts on delivering innovation and competitive advantages, and ultimately creating more business value.

Jessica Abelson is Director of Product Marketing at Transposit

The Latest

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...

As Digital Transformation Prevails, Automation Remains a Top Priority for DevOps, ITOps and SRE Teams

Jessica Abelson
Transposit

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research.

In a new survey collected from 1,046 engineering, IT Operations, DevOps and site reliability engineering professionals in the United States with the role of VP, Director, Manager or individual contributor at organizations with over 300 employees, almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management.

The second annual State of DevOps Automation Report, commissioned by Transposit also revealed close to sixty percent of organizations are losing up to half a million dollars per hour to downtime, a critical issue that can be mitigated with better automation and collaboration.

Organizations Still Lack Full Integration of Incident Response Tools

With 90.2% of organizations reporting an increased focus on digital transformation over the past year, paired with the persistence of hybrid and remote work, almost three-quarters (73.4%) of operations teams have expanded their tech stack. However, when asked how well integrated the various tools used during incident response are, only one quarter (24.7%) said all of their tools are integrated through one tool or platform. This means the vast majority (75.3%) don’t have full integration, leaving teams at risk of slow issue detection and analysis and a decrease in overall quality of service reliability and customer experience.

Broader deployment of automation has led developers to recognize that it’s key to reducing downtime and increasing resolution. This was seen by 3 in 4 organizations that implemented a continuous workflow to incident response for service management after adopting a hybrid workforce model.

Manual Processes Are Outdated and Lead to Higher Cost of Downtime and Service Incident Volume

The survey also found that more than a third (39.7%) of organizations had an increased cost of downtime during the last year (March 2021 to now). In fact, 58.2% reported that downtime (i.e., application outages, service degradation) cost their organization up to $499,999 per hour on average. Of those who reported an increase in the amount of time it takes to resolve incidents, 45.2% said it was due to a lack of unified communication with teammates (people are collaborating using disparate tools).


"Organizations need to deliver innovation faster and more efficiently than ever before. However, too many SRE, ITOps and DevOps teams are wasting time on disconnected, manual processes and playing a reactive game of whack-a-mole as they try to keep applications running," said Divanny Lamas, CEO of Transposit.

Operations teams are experiencing challenges while trying to solve incidents, including difficulties reaching people with specialized knowledge, inadequate support from collaboration methods and tools and lack of automation. When asked if they have observed any change in the frequency of service incidents that have affected their customers over the course of the last year (March 2021 to now), 62.9% of respondents reported an increase. Of those who said there was an increase in service incidents, respondents said the top reasons why this happened are digital transformation (60.7%), rolling out of new products or product updates (55.1%), methods and tools for collaboration did not adequately support their remote team (49.3%) and organizational change including team member churn, influx of new team members, and M&A activity (45.4%).

The Key to Faster Resolution of Incidents and Less Downtime: SRE Practices Combined with Automation

The rising demand for site reliability engineering is clear, as 75.6% of respondents said there has been an increased focus on SRE practices in their organization in the past 12 months, and of those, 35.1% plan to expand SRE efforts in 2022. Additionally, 65.1% of respondents plan to hire site reliability engineers in the next 12 months.

The need for automation tools is evident in the SRE roles to complement organizations’ increased focus on site reliability practices; 42.3% of SREs said the current level of automation is not meeting their organization’s needs and they are actively pursuing a new solution to solve for this shortage.

SREs are still dealing with cumbersome and tedious processes, despite the increased demand for SRE practices. Over half of SREs (56.5%) reported they still manually enter data into an ITSM system or other system or record to keep track of actions that were taken by humans during the resolution of an incident.

To scale, organizations need to implement automation technology to rid teams of these time-consuming manual processes. This is underlined by the fact that a full 100% of the respondents with a VP/Director/Manager SRE title who cited a decrease or no change in service incidents said it was because their organization implemented automation technology to help reduce the number of service incidents. Respondents also said better documentation, process and availability of data during incidents would have the most impact on MTTR, downtime and quality of service reliability.

As seen in the survey, organizations' approaches to automation differ. A majority (63%) responded that their approach to automation was incremental automation, in which they begin by codifying processes and work up to more advanced, fully automated scenarios. When asked whether automation should let humans use their judgment at critical decision points to be more reliable and effective, 80.4% of respondents said yes. Automation that keeps humans in the loop at key decision points increases flexibility and stability while automating repetitive tasks.

The top three tasks respondents would like automated are: service requests (52.6%), change requests (42.9%) and user provisioning (39.8%). Organizations are seeing the need to double-down on automation — the top three ways organizations plan to improve their incident management process are to implement new automation tools or applications (48.2%), implement new communications/collaboration tools or applications (41.5%) and implement new integration tools or applications (40.6%).

The survey makes it clear that ITOps, DevOps and SRE professionals should consider enhancing service reliability through human-in-the-loop automation, SRE practices and better collaboration methods. Teams enabled with these tools and process advancements are better empowered to spend their time and efforts on delivering innovation and competitive advantages, and ultimately creating more business value.

Jessica Abelson is Director of Product Marketing at Transposit

The Latest

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

In MEAN TIME TO INSIGHT Episode 13, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses hybrid multi-cloud networking strategy ... 

In high-traffic environments, the sheer volume and unpredictable nature of network incidents can quickly overwhelm even the most skilled teams, hindering their ability to react swiftly and effectively, potentially impacting service availability and overall business performance. This is where closed-loop remediation comes into the picture: an IT management concept designed to address the escalating complexity of modern networks ...

In 2025, enterprise workflows are undergoing a seismic shift. Propelled by breakthroughs in generative AI (GenAI), large language models (LLMs), and natural language processing (NLP), a new paradigm is emerging — agentic AI. This technology is not just automating tasks; it's reimagining how organizations make decisions, engage customers, and operate at scale ...

In the early days of the cloud revolution, business leaders perceived cloud services as a means of sidelining IT organizations. IT was too slow, too expensive, or incapable of supporting new technologies. With a team of developers, line of business managers could deploy new applications and services in the cloud. IT has been fighting to retake control ever since. Today, IT is back in the driver's seat, according to new research by Enterprise Management Associates (EMA) ...

In today's fast-paced and increasingly complex network environments, Network Operations Centers (NOCs) are the backbone of ensuring continuous uptime, smooth service delivery, and rapid issue resolution. However, the challenges faced by NOC teams are only growing. In a recent study, 78% state network complexity has grown significantly over the last few years while 84% regularly learn about network issues from users. It is imperative we adopt a new approach to managing today's network experiences ...

Image
Broadcom

From growing reliance on FinOps teams to the increasing attention on artificial intelligence (AI), and software licensing, the Flexera 2025 State of the Cloud Report digs into how organizations are improving cloud spend efficiency, while tackling the complexities of emerging technologies ...

Today, organizations are generating and processing more data than ever before. From training AI models to running complex analytics, massive datasets have become the backbone of innovation. However, as businesses embrace the cloud for its scalability and flexibility, a new challenge arises: managing the soaring costs of storing and processing this data ...