Skip to main content

Site Reliability Engineering (SRE) is the Force Multiplier of Digital Experiences

Colin Fallwell
Sumo Logic

The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.

DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.

Image removed.

Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.

Closing the Skills Gap

The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.

Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.

It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.

The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.

One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.

SRE Best Practices Can Unify Teams

Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.

Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.

Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:

- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)

The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.

Looking into the Future of SRE

Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.

With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.

Colin Fallwell is Field CTO of Sumo Logic

Hot Topics

The Latest

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 3 covers AI's impact on employees and their roles ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 2 covers the challenges presented by AI, as well as solutions to those problems ...

In the final part of APMdigest's 2025 Predictions Series, industry experts offer predictions on how AI will evolve and impact technology and business in 2025 ...

E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...

Efficiency is a highly-desirable objective in business ... We're seeing this scenario play out in enterprises around the world as they continue to struggle with infrastructures and remote work models with an eye toward operational efficiencies. In contrast to that goal, a recent Broadcom survey of global IT and network professionals found widespread adoption of these strategies is making the network more complex and hampering observability, leading to uptime, performance and security issues. Let's look more closely at these challenges ...

Image
Broadcom

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Image
Catchpoint

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Image
Datastrike

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...

The adoption of artificial intelligence (AI) is accelerating across the telecoms industry, with 88% of fixed broadband service providers now investigating or trialing AI automation to enhance their fixed broadband services, according to new research from Incognito Software Systems and Omdia ...

 

AWS is a cloud-based computing platform known for its reliability, scalability, and flexibility. However, as helpful as its comprehensive infrastructure is, disparate elements and numerous siloed components make it difficult for admins to visualize the cloud performance in detail. It requires meticulous monitoring techniques and deep visibility to understand cloud performance and analyze operational efficiency in detail to ensure seamless cloud operations ...

Site Reliability Engineering (SRE) is the Force Multiplier of Digital Experiences

Colin Fallwell
Sumo Logic

The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.

DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.

Image removed.

Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.

Closing the Skills Gap

The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.

Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.

It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.

The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.

One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.

SRE Best Practices Can Unify Teams

Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.

Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.

Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:

- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)

The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.

Looking into the Future of SRE

Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.

With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.

Colin Fallwell is Field CTO of Sumo Logic

Hot Topics

The Latest

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 3 covers AI's impact on employees and their roles ...

Industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 2 covers the challenges presented by AI, as well as solutions to those problems ...

In the final part of APMdigest's 2025 Predictions Series, industry experts offer predictions on how AI will evolve and impact technology and business in 2025 ...

E-commerce is set to skyrocket with a 9% rise over the next few years ... To thrive in this competitive environment, retailers must identify digital resilience as their top priority. In a world where savvy shoppers expect 24/7 access to online deals and experiences, any unexpected downtime to digital services can lead to significant financial losses, damage to brand reputation, abandoned carts with designer shoes, and additional issues ...

Efficiency is a highly-desirable objective in business ... We're seeing this scenario play out in enterprises around the world as they continue to struggle with infrastructures and remote work models with an eye toward operational efficiencies. In contrast to that goal, a recent Broadcom survey of global IT and network professionals found widespread adoption of these strategies is making the network more complex and hampering observability, leading to uptime, performance and security issues. Let's look more closely at these challenges ...

Image
Broadcom

The 2025 Catchpoint SRE Report dives into the forces transforming the SRE landscape, exploring both the challenges and opportunities ahead. Let's break down the key findings and what they mean for SRE professionals and the businesses relying on them ...

Image
Catchpoint

The pressure on IT teams has never been greater. As data environments grow increasingly complex, resource shortages are emerging as a major obstacle for IT leaders striving to meet the demands of modern infrastructure management ... According to DataStrike's newly released 2025 Data Infrastructure Survey Report, more than half (54%) of IT leaders cite resource limitations as a top challenge, highlighting a growing trend toward outsourcing as a solution ...

Image
Datastrike

Gartner revealed its top strategic predictions for 2025 and beyond. Gartner's top predictions explore how generative AI (GenAI) is affecting areas where most would assume only humans can have lasting impact ...

The adoption of artificial intelligence (AI) is accelerating across the telecoms industry, with 88% of fixed broadband service providers now investigating or trialing AI automation to enhance their fixed broadband services, according to new research from Incognito Software Systems and Omdia ...

 

AWS is a cloud-based computing platform known for its reliability, scalability, and flexibility. However, as helpful as its comprehensive infrastructure is, disparate elements and numerous siloed components make it difficult for admins to visualize the cloud performance in detail. It requires meticulous monitoring techniques and deep visibility to understand cloud performance and analyze operational efficiency in detail to ensure seamless cloud operations ...