Skip to main content

Site Reliability Engineering (SRE) is the Force Multiplier of Digital Experiences

Colin Fallwell
Sumo Logic

The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.

DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.


Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.

Closing the Skills Gap

The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.

Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.

It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.

The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.

One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.

SRE Best Practices Can Unify Teams

Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.

Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.

Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:

- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)

The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.

Looking into the Future of SRE

Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.

With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.

Colin Fallwell is Field CTO of Sumo Logic

Hot Topics

The Latest

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Regardless of their scale, business decisions often take time, effort, and a lot of back-and-forth discussion to reach any sort of actionable conclusion ... Any means of streamlining this process and getting from complex problems to optimal solutions more efficiently and reliably is key. How can organizations optimize their decision-making to save time and reduce excess effort from those involved? ...

As enterprises accelerate their cloud adoption strategies, CIOs are routinely exceeding their cloud budgets — a concern that's about to face additional pressure from an unexpected direction: uncertainty over semiconductor tariffs. The CIO Cloud Trends Survey & Report from Azul reveals the extent continued cloud investment despite cost overruns, and how organizations are attempting to bring spending under control ...

Image
Azul

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency

Site Reliability Engineering (SRE) is the Force Multiplier of Digital Experiences

Colin Fallwell
Sumo Logic

The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.

DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.


Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.

Closing the Skills Gap

The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.

Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.

It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.

The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.

One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.

SRE Best Practices Can Unify Teams

Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.

Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.

Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:

- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)

The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.

Looking into the Future of SRE

Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.

With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.

Colin Fallwell is Field CTO of Sumo Logic

Hot Topics

The Latest

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

In March, New Relic published the State of Observability for Media and Entertainment Report to share insights, data, and analysis into the adoption and business value of observability across the media and entertainment industry. Here are six key takeaways from the report ...

Regardless of their scale, business decisions often take time, effort, and a lot of back-and-forth discussion to reach any sort of actionable conclusion ... Any means of streamlining this process and getting from complex problems to optimal solutions more efficiently and reliably is key. How can organizations optimize their decision-making to save time and reduce excess effort from those involved? ...

As enterprises accelerate their cloud adoption strategies, CIOs are routinely exceeding their cloud budgets — a concern that's about to face additional pressure from an unexpected direction: uncertainty over semiconductor tariffs. The CIO Cloud Trends Survey & Report from Azul reveals the extent continued cloud investment despite cost overruns, and how organizations are attempting to bring spending under control ...

Image
Azul

According to Auvik's 2025 IT Trends Report, 60% of IT professionals feel at least moderately burned out on the job, with 43% stating that their workload is contributing to work stress. At the same time, many IT professionals are naming AI and machine learning as key areas they'd most like to upskill ...

Businesses that face downtime or outages risk financial and reputational damage, as well as reducing partner, shareholder, and customer trust. One of the major challenges that enterprises face is implementing a robust business continuity plan. What's the solution? The answer may lie in disaster recovery tactics such as truly immutable storage and regular disaster recovery testing ...

IT spending is expected to jump nearly 10% in 2025, and organizations are now facing pressure to manage costs without slowing down critical functions like observability. To meet the challenge, leaders are turning to smarter, more cost effective business strategies. Enter stage right: OpenTelemetry, the missing piece of the puzzle that is no longer just an option but rather a strategic advantage ...

Amidst the threat of cyberhacks and data breaches, companies install several security measures to keep their business safely afloat. These measures aim to protect businesses, employees, and crucial data. Yet, employees perceive them as burdensome. Frustrated with complex logins, slow access, and constant security checks, workers decide to completely bypass all security set-ups ...

Image
Cloudbrink's Personal SASE services provide last-mile acceleration and reduction in latency