SREs that fail to deliver customer value run the risk of being stuck in an operational toil rut. Conversely, businesses failing to recognize the bi-modal nature and importance of SRE activities run the risk of losing talented employees and their competitive edge.
Produced in partnership with DevOps Institute and VMWare Tanzu, Catchpoint's 2021 SRE Report is the company's fourth annual report on site reliability engineering. Analysis is based on worldwide responses from almost 300 site reliability engineers (SREs) to the SRE Survey. This blog focuses on one specific finding: The boundaries of observability strategies need to expand to include digital experience metrics and business KPIs.
SRE: Two Jobs in One
SREs are double struck with challenges because the SRE role is essentially two jobs in one. SREs must increase the efficiency of their operational activities and further develop — while also mitigating the risk of — their dev/transformational activities.
SREs and business leaders should always seek out more opportunities for common ground, and one of the most transformative ways to do this is by expanding how monitoring and observability is thought about and leveraged inside your organization.
Key Report Findings on Monitoring and Observability
Three results from the SRE Survey were particularly illuminating in relation to how SREs are currently using monitoring and observability data.
1. What drives the use of monitoring data?
The top driver for monitoring data cited by SREs was the desire to "augment troubleshooting and root cause analysis" (66% said this was a major driver), followed closely by "ensure service level objectives are met. (51% said this was a major driver)" While fixing problems came as no surprise as the primary driver, what was surprising was the fact that only 31% of respondents said that the main driver of monitoring data for them was the goal to "provide analytics to our business teams."
2. What data sources are fed into your observability frameworks?
Scoring highest was "application monitoring", followed by "infrastructure monitoring", then "network monitoring". "Front end user experience monitoring" came in fourth.
Monitoring that prioritizes the understanding of user experience from the outside-in perspective needs to be more of a focus for SREs since it enables comprehensive insight into the end-to-end journey of the customer. Ultimately, monitoring to ensure customer experience across the entire delivery chain will ensure a greater alignment with business goals.
3. What critical factors drive successful SRE implementations?
The answers here revealed that the priorities of SREs continue to skew toward daily operational activities (60% of respondents said their dominant focus was on "how quickly we resolve incidents") as opposed to business-driven outcomes (only 33% said a major driver for SRE was "how quickly our business can expand to new markets", for instance).
The Value of Understanding Customer Experience for SREs
Nonetheless, we are seeing a pivot among SREs toward a prioritization of customer experience — the site where SRE and business meet. In response to what drives the use of monitoring data, 49% of respondents said a major driver for monitoring data use was "to enhance the customer experience" while 51% cited "ensuring service level objectives are met" as the main driving force. Since meeting SLOs is indirectly related to enhancing the customer experience, these two responses can be viewed together.
Ensuring good customer experience directly aligns with helping achieve overall business goals. SRE and engineering teams need to ensure they are a mission-critical part of that conversation.
One of the contributors to the SRE Report's Spotlights (responses to the findings from a range of industry practitioners) was Tamara Miner, Engineering Manager at Improbable Games who shared the way in which they are promoting good practice for engineers to see the impact of their work on the customer.
"In addition to understanding how business value ties back to SRE SLIs," Tamara shared, "it is important to close the feedback loop by communicating actual customer sentiment and felt value back to the engineers working on improvements in these areas."
How Can SREs Consistently Deliver Customer Value?
The 2021 SRE Report concludes by offering an actionable path for SREs to consistently realize their value and bridge the conversational gap with business leaders.
Using a value chain path around differently focused capabilities increases the ability to reproduce results over time. SREs and businesses can then focus on scaling their activities and avoid "blind luck" situations in which they're not fully sure how their value was realized.
"To make SRE successful, you need to move beyond SRE practice and tie success back to the business. If you can get business leaders onboard with the tech, then you'll be successful."
Scott Rogers — VMWare Tanzu Observability
SREs by definition are not bound by traditional limits since their job continually requires them to transcend boundaries. SRE teams must look to expand their observability strategies to make sure it includes user experience and business KPIs. This ultimately returns to the driving force behind site reliability engineering as a practice: the passionate desire to solve complex problems. Whatever the biggest problem your business is having, SREs should attach themselves to that problem and help realize customer value.