Skip to main content

As Digital Transformation Prevails, Automation Remains a Top Priority for DevOps, ITOps and SRE Teams

Jessica Abelson
Transposit

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research.

In a new survey collected from 1,046 engineering, IT Operations, DevOps and site reliability engineering professionals in the United States with the role of VP, Director, Manager or individual contributor at organizations with over 300 employees, almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management.

The second annual State of DevOps Automation Report, commissioned by Transposit also revealed close to sixty percent of organizations are losing up to half a million dollars per hour to downtime, a critical issue that can be mitigated with better automation and collaboration.

Organizations Still Lack Full Integration of Incident Response Tools

With 90.2% of organizations reporting an increased focus on digital transformation over the past year, paired with the persistence of hybrid and remote work, almost three-quarters (73.4%) of operations teams have expanded their tech stack. However, when asked how well integrated the various tools used during incident response are, only one quarter (24.7%) said all of their tools are integrated through one tool or platform. This means the vast majority (75.3%) don’t have full integration, leaving teams at risk of slow issue detection and analysis and a decrease in overall quality of service reliability and customer experience.

Broader deployment of automation has led developers to recognize that it’s key to reducing downtime and increasing resolution. This was seen by 3 in 4 organizations that implemented a continuous workflow to incident response for service management after adopting a hybrid workforce model.

Manual Processes Are Outdated and Lead to Higher Cost of Downtime and Service Incident Volume

The survey also found that more than a third (39.7%) of organizations had an increased cost of downtime during the last year (March 2021 to now). In fact, 58.2% reported that downtime (i.e., application outages, service degradation) cost their organization up to $499,999 per hour on average. Of those who reported an increase in the amount of time it takes to resolve incidents, 45.2% said it was due to a lack of unified communication with teammates (people are collaborating using disparate tools).


"Organizations need to deliver innovation faster and more efficiently than ever before. However, too many SRE, ITOps and DevOps teams are wasting time on disconnected, manual processes and playing a reactive game of whack-a-mole as they try to keep applications running," said Divanny Lamas, CEO of Transposit.

Operations teams are experiencing challenges while trying to solve incidents, including difficulties reaching people with specialized knowledge, inadequate support from collaboration methods and tools and lack of automation. When asked if they have observed any change in the frequency of service incidents that have affected their customers over the course of the last year (March 2021 to now), 62.9% of respondents reported an increase. Of those who said there was an increase in service incidents, respondents said the top reasons why this happened are digital transformation (60.7%), rolling out of new products or product updates (55.1%), methods and tools for collaboration did not adequately support their remote team (49.3%) and organizational change including team member churn, influx of new team members, and M&A activity (45.4%).

The Key to Faster Resolution of Incidents and Less Downtime: SRE Practices Combined with Automation

The rising demand for site reliability engineering is clear, as 75.6% of respondents said there has been an increased focus on SRE practices in their organization in the past 12 months, and of those, 35.1% plan to expand SRE efforts in 2022. Additionally, 65.1% of respondents plan to hire site reliability engineers in the next 12 months.

The need for automation tools is evident in the SRE roles to complement organizations’ increased focus on site reliability practices; 42.3% of SREs said the current level of automation is not meeting their organization’s needs and they are actively pursuing a new solution to solve for this shortage.

SREs are still dealing with cumbersome and tedious processes, despite the increased demand for SRE practices. Over half of SREs (56.5%) reported they still manually enter data into an ITSM system or other system or record to keep track of actions that were taken by humans during the resolution of an incident.

To scale, organizations need to implement automation technology to rid teams of these time-consuming manual processes. This is underlined by the fact that a full 100% of the respondents with a VP/Director/Manager SRE title who cited a decrease or no change in service incidents said it was because their organization implemented automation technology to help reduce the number of service incidents. Respondents also said better documentation, process and availability of data during incidents would have the most impact on MTTR, downtime and quality of service reliability.

As seen in the survey, organizations' approaches to automation differ. A majority (63%) responded that their approach to automation was incremental automation, in which they begin by codifying processes and work up to more advanced, fully automated scenarios. When asked whether automation should let humans use their judgment at critical decision points to be more reliable and effective, 80.4% of respondents said yes. Automation that keeps humans in the loop at key decision points increases flexibility and stability while automating repetitive tasks.

The top three tasks respondents would like automated are: service requests (52.6%), change requests (42.9%) and user provisioning (39.8%). Organizations are seeing the need to double-down on automation — the top three ways organizations plan to improve their incident management process are to implement new automation tools or applications (48.2%), implement new communications/collaboration tools or applications (41.5%) and implement new integration tools or applications (40.6%).

The survey makes it clear that ITOps, DevOps and SRE professionals should consider enhancing service reliability through human-in-the-loop automation, SRE practices and better collaboration methods. Teams enabled with these tools and process advancements are better empowered to spend their time and efforts on delivering innovation and competitive advantages, and ultimately creating more business value.

Jessica Abelson is Director of Product Marketing at Transposit

The Latest

Most organizations approach OpenTelemetry as a collection of individual tools they need to assemble from scratch. This view misses the bigger picture. OpenTelemetry is a complete telemetry framework with composable components that address specific problems at different stages of organizational maturity. You start with what you need today and adopt additional pieces as your observability practices evolve ...

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

As discussions around AI "autonomous coworkers" accelerate, many industry projections assume that agents will soon operate alongside human staff in making decisions, taking actions, and managing tasks with minimal oversight. But a growing number of critics (including some of the developers building these systems) argue that the industry still has a long way to go to be able to treat AI agents like fully trusted teammates ...

Enterprise AI has entered a transformational phase where, according to Digitate's recently released survey, Agentic AI and the Future of Enterprise IT, companies are moving beyond traditional automation toward Agentic AI systems designed to reason, adapt, and collaborate alongside human teams ...

The numbers back this urgency up. A recent Zapier survey shows that 92% of enterprises now treat AI as a top priority. Leaders want it, and teams are clamoring for it. But if you look closer at the operations of these companies, you see a different picture. The rollout is slow. The results are often delayed. There's a disconnect between what leaders want and what their technical infrastructure can handle ...

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Basic uptime is no longer the gold standard. By 2026, network monitoring must do more than report status, it must explain performance in a hybrid-first world. Networks are no longer just static support systems; they are agile, distributed architectures that sit at the very heart of the customer experience and the business outcomes ... The following five trends represent the new standard for network health, providing a blueprint for teams to move from reactive troubleshooting to a proactive, integrated future ...

APMdigest's Predictions Series concludes with 2026 AI Predictions — industry experts offer predictions on how AI and related technologies will evolve and impact business in 2026. Part 5, the final installment, covers AI's impacts on IT teams ...

As Digital Transformation Prevails, Automation Remains a Top Priority for DevOps, ITOps and SRE Teams

Jessica Abelson
Transposit

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research.

In a new survey collected from 1,046 engineering, IT Operations, DevOps and site reliability engineering professionals in the United States with the role of VP, Director, Manager or individual contributor at organizations with over 300 employees, almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management.

The second annual State of DevOps Automation Report, commissioned by Transposit also revealed close to sixty percent of organizations are losing up to half a million dollars per hour to downtime, a critical issue that can be mitigated with better automation and collaboration.

Organizations Still Lack Full Integration of Incident Response Tools

With 90.2% of organizations reporting an increased focus on digital transformation over the past year, paired with the persistence of hybrid and remote work, almost three-quarters (73.4%) of operations teams have expanded their tech stack. However, when asked how well integrated the various tools used during incident response are, only one quarter (24.7%) said all of their tools are integrated through one tool or platform. This means the vast majority (75.3%) don’t have full integration, leaving teams at risk of slow issue detection and analysis and a decrease in overall quality of service reliability and customer experience.

Broader deployment of automation has led developers to recognize that it’s key to reducing downtime and increasing resolution. This was seen by 3 in 4 organizations that implemented a continuous workflow to incident response for service management after adopting a hybrid workforce model.

Manual Processes Are Outdated and Lead to Higher Cost of Downtime and Service Incident Volume

The survey also found that more than a third (39.7%) of organizations had an increased cost of downtime during the last year (March 2021 to now). In fact, 58.2% reported that downtime (i.e., application outages, service degradation) cost their organization up to $499,999 per hour on average. Of those who reported an increase in the amount of time it takes to resolve incidents, 45.2% said it was due to a lack of unified communication with teammates (people are collaborating using disparate tools).


"Organizations need to deliver innovation faster and more efficiently than ever before. However, too many SRE, ITOps and DevOps teams are wasting time on disconnected, manual processes and playing a reactive game of whack-a-mole as they try to keep applications running," said Divanny Lamas, CEO of Transposit.

Operations teams are experiencing challenges while trying to solve incidents, including difficulties reaching people with specialized knowledge, inadequate support from collaboration methods and tools and lack of automation. When asked if they have observed any change in the frequency of service incidents that have affected their customers over the course of the last year (March 2021 to now), 62.9% of respondents reported an increase. Of those who said there was an increase in service incidents, respondents said the top reasons why this happened are digital transformation (60.7%), rolling out of new products or product updates (55.1%), methods and tools for collaboration did not adequately support their remote team (49.3%) and organizational change including team member churn, influx of new team members, and M&A activity (45.4%).

The Key to Faster Resolution of Incidents and Less Downtime: SRE Practices Combined with Automation

The rising demand for site reliability engineering is clear, as 75.6% of respondents said there has been an increased focus on SRE practices in their organization in the past 12 months, and of those, 35.1% plan to expand SRE efforts in 2022. Additionally, 65.1% of respondents plan to hire site reliability engineers in the next 12 months.

The need for automation tools is evident in the SRE roles to complement organizations’ increased focus on site reliability practices; 42.3% of SREs said the current level of automation is not meeting their organization’s needs and they are actively pursuing a new solution to solve for this shortage.

SREs are still dealing with cumbersome and tedious processes, despite the increased demand for SRE practices. Over half of SREs (56.5%) reported they still manually enter data into an ITSM system or other system or record to keep track of actions that were taken by humans during the resolution of an incident.

To scale, organizations need to implement automation technology to rid teams of these time-consuming manual processes. This is underlined by the fact that a full 100% of the respondents with a VP/Director/Manager SRE title who cited a decrease or no change in service incidents said it was because their organization implemented automation technology to help reduce the number of service incidents. Respondents also said better documentation, process and availability of data during incidents would have the most impact on MTTR, downtime and quality of service reliability.

As seen in the survey, organizations' approaches to automation differ. A majority (63%) responded that their approach to automation was incremental automation, in which they begin by codifying processes and work up to more advanced, fully automated scenarios. When asked whether automation should let humans use their judgment at critical decision points to be more reliable and effective, 80.4% of respondents said yes. Automation that keeps humans in the loop at key decision points increases flexibility and stability while automating repetitive tasks.

The top three tasks respondents would like automated are: service requests (52.6%), change requests (42.9%) and user provisioning (39.8%). Organizations are seeing the need to double-down on automation — the top three ways organizations plan to improve their incident management process are to implement new automation tools or applications (48.2%), implement new communications/collaboration tools or applications (41.5%) and implement new integration tools or applications (40.6%).

The survey makes it clear that ITOps, DevOps and SRE professionals should consider enhancing service reliability through human-in-the-loop automation, SRE practices and better collaboration methods. Teams enabled with these tools and process advancements are better empowered to spend their time and efforts on delivering innovation and competitive advantages, and ultimately creating more business value.

Jessica Abelson is Director of Product Marketing at Transposit

The Latest

Most organizations approach OpenTelemetry as a collection of individual tools they need to assemble from scratch. This view misses the bigger picture. OpenTelemetry is a complete telemetry framework with composable components that address specific problems at different stages of organizational maturity. You start with what you need today and adopt additional pieces as your observability practices evolve ...

One of the earliest lessons I learned from architecting throughput-heavy services is that simplicity wins repeatedly: fewer moving parts, loosely coupled execution (fewer synchronous calls), and precise timing metering. You want data and decisions to travel the shortest possible path. The goal is to build a system where every strategy and each line of code (contention is the key metric) complements the decision trees ...

As discussions around AI "autonomous coworkers" accelerate, many industry projections assume that agents will soon operate alongside human staff in making decisions, taking actions, and managing tasks with minimal oversight. But a growing number of critics (including some of the developers building these systems) argue that the industry still has a long way to go to be able to treat AI agents like fully trusted teammates ...

Enterprise AI has entered a transformational phase where, according to Digitate's recently released survey, Agentic AI and the Future of Enterprise IT, companies are moving beyond traditional automation toward Agentic AI systems designed to reason, adapt, and collaborate alongside human teams ...

The numbers back this urgency up. A recent Zapier survey shows that 92% of enterprises now treat AI as a top priority. Leaders want it, and teams are clamoring for it. But if you look closer at the operations of these companies, you see a different picture. The rollout is slow. The results are often delayed. There's a disconnect between what leaders want and what their technical infrastructure can handle ...

Kyndryl's 2025 Readiness Report revealed that 61% of global business and technology leaders report increasing pressure from boards and regulators to prove AI's ROI. As the technology evolves and expectations continue to rise, leaders are compelled to generate and prove impact before scaling further. This will lead to a decisive turning point in 2026 ...

Cloudflare's disruption illustrates how quickly a single provider's issue cascades into widespread exposure. Many organizations don't fully realize how tightly their systems are coupled to thirdparty services, or how quickly availability and security concerns align when those services falter ... You can't avoid these dependencies, but you can understand them ...

If you work with AI, you know this story. A model performs during testing, looks great in early reviews, works perfectly in production and then slowly loses relevance after operating for a while. Everything on the surface looks perfect — pipelines are running, predictions or recommendations are error-free, data quality checks show green; yet outcomes don't meet the ground reality. This pattern often repeats across enterprise AI programs. Take for example, a mid-sized retail banking and wealth-management firm with heavy investments in AI-powered risk analytics, fraud detection and personalized credit-decisioning systems. The model worked well for a while, but transactions increased, so did false positives by 18% ...

Basic uptime is no longer the gold standard. By 2026, network monitoring must do more than report status, it must explain performance in a hybrid-first world. Networks are no longer just static support systems; they are agile, distributed architectures that sit at the very heart of the customer experience and the business outcomes ... The following five trends represent the new standard for network health, providing a blueprint for teams to move from reactive troubleshooting to a proactive, integrated future ...

APMdigest's Predictions Series concludes with 2026 AI Predictions — industry experts offer predictions on how AI and related technologies will evolve and impact business in 2026. Part 5, the final installment, covers AI's impacts on IT teams ...