The pandemic spurred a wave of digital services because they allowed companies to stay competitive in the digital transformation. This trend, in turn, caused companies to adopt site reliability engineering (SRE) to keep up with the customer demand for digital experiences.
DevOps Institute recently published the Global SRE Pulse 2022 highlighting the growing adoption of SRE as a central operating model to deliver digital services and applications.
Even with over 62% of respondents saying their organizations are leveraging SRE within their company today, the survey shows that many organizations are at different stages within SRE adoption. Only 1% of respondents report that they tried SRE but that it did not work for their company.
SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos.
Closing the Skills Gap
The biggest challenge when adopting SRE is finding those with the right skills to make SRE to work properly — with 85% of respondents citing the lack of staff with necessary skills as their biggest challenge.
Leaders can address skill gaps by training talent and promoting within the organization. It's important to not only look at the technical skills but also at a candidate's ability to see and advocate for the relationship between engineering and business.
It's also essential to implement automation solutions to reduce the manual work of solving priority alerts. It's not just a matter of implementing technology though. Teams must also update processes to ensure the technology is used by everyone, including those who resist AIOps and automation.
The survey found that some teams are implementing intelligent automation everywhere to ensure the reliability and continuous operation of systems. Specifically, 29% of respondents said they are currently leveraging observability tools and techniques.
One method of advancing automation is through chaos engineering and intentionally destroying and rebuilding environments to improve both hygiene and confidence. However, 43% of survey respondents said they're not applying chaos engineering at all, so there is significant opportunity for those willing to learn the skills.
SRE Best Practices Can Unify Teams
Siloed teams is another common challenge for organizations. Communication and dependencies delay responses and innovation. SREs can bridge the gap between IT and developers if leaders first implement these SRE best practices across teams.
■ Track and manage toil. Toil is work that is manual, repetitive, automatable, tactical, or devoid of enduring value, and it scales linearly as a service grows. In the survey, 66% of respondents said they measure toil in some or several teams, and 11% indicated they track toil everywhere. By measuring toil, SREs can proactively reduce its effects across teams to improve reliability.
■ Provide ongoing support. Organizations also report implementing SRE best practices, including these across all teams:
- Adopting observability and monitoring tools (29%)
- Supporting essential job certifications (27%)
- Practicing a no blame philosophy (36%)
The two most widely adopted practices to at least some extent were practicing no blame (92%) and retrospectives or post-mortems (95%). The philosophy of learning from failure is what drives SRE success in many organizations.
Looking into the Future of SRE
Overall, the level of maturity revealed by the Global SRE Pulse survey indicates that many organizations are invested in improving SRE and making it part of their processes and cultures.
With 37% of organizations reporting that they have centralized SRE teams, it appears the practices and topologies are evolving. But the foundation for SRE is on solid ground and business leaders can expect SRE to remain a fixture in the industry. Beyond that, SRE also has the opportunity to be a unifying force between IT and business departments. By partnering with business and development teams, SRE will have the ability to influence and improve business outcomes.
The Latest
APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 2 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...
Most organizations suffer from some form of alert noise. Alert noise is only going to increase as organizations support cloud-native applications spanning multiple public and private clouds, including ephemeral deployments and more. It's not going to get easier for organizations to understand the signal from all those alerts being sent. So what can be done about it? ...
This blog presents the case for a radical new approach to basic information technology (IT) education. This conclusion is based on a study of courses and other forms of IT education which purport to cover IT "fundamentals" ...
To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...
APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...
CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...
Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...
Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...
Managing availability and performance within SAP environments has long been a challenge for IT teams. But as IT environments grow more complex and dynamic, and the speed of innovation in almost every industry continues to accelerate, this situation is becoming a whole lot worse ...
Harnessing the power of network-derived intelligence and insights is critical in detecting today's increasingly sophisticated security threats across hybrid and multi-cloud infrastructure, according to a new research study from IDC ...