Skip to main content

Site Reliability Engineering (SRE) is the Force Multiplier of Digital Experiences

Colin Fallwell
Sumo Logic

The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.

DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.


Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.

Closing the Skills Gap

The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.

Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.

It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.

The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.

One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.

SRE Best Practices Can Unify Teams

Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.

Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.

Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:

- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)

The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.

Looking into the Future of SRE

Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.

With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.

Colin Fallwell is Field CTO of Sumo Logic

Hot Topics

The Latest

As businesses increasingly rely on high-performance applications to deliver seamless user experiences, the demand for fast, reliable, and scalable data storage systems has never been greater. Redis — an open-source, in-memory data structure store — has emerged as a popular choice for use cases ranging from caching to real-time analytics. But with great performance comes the need for vigilant monitoring ...

Kubernetes was not initially designed with AI's vast resource variability in mind, and the rapid rise of AI has exposed Kubernetes limitations, particularly when it comes to cost and resource efficiency. Indeed, AI workloads differ from traditional applications in that they require a staggering amount and variety of compute resources, and their consumption is far less consistent than traditional workloads ... Considering the speed of AI innovation, teams cannot afford to be bogged down by these constant infrastructure concerns. A solution is needed ...

AI is the catalyst for significant investment in data teams as enterprises require higher-quality data to power their AI applications, according to the State of Analytics Engineering Report from dbt Labs ...

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

Site Reliability Engineering (SRE) is the Force Multiplier of Digital Experiences

Colin Fallwell
Sumo Logic

The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.

DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.


Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.

Closing the Skills Gap

The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.

Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.

It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.

The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.

One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.

SRE Best Practices Can Unify Teams

Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.

Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.

Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:

- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)

The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.

Looking into the Future of SRE

Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.

With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.

Colin Fallwell is Field CTO of Sumo Logic

Hot Topics

The Latest

As businesses increasingly rely on high-performance applications to deliver seamless user experiences, the demand for fast, reliable, and scalable data storage systems has never been greater. Redis — an open-source, in-memory data structure store — has emerged as a popular choice for use cases ranging from caching to real-time analytics. But with great performance comes the need for vigilant monitoring ...

Kubernetes was not initially designed with AI's vast resource variability in mind, and the rapid rise of AI has exposed Kubernetes limitations, particularly when it comes to cost and resource efficiency. Indeed, AI workloads differ from traditional applications in that they require a staggering amount and variety of compute resources, and their consumption is far less consistent than traditional workloads ... Considering the speed of AI innovation, teams cannot afford to be bogged down by these constant infrastructure concerns. A solution is needed ...

AI is the catalyst for significant investment in data teams as enterprises require higher-quality data to power their AI applications, according to the State of Analytics Engineering Report from dbt Labs ...

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...