Site Reliability Engineers

SRE

January 29, 2024

As decentralized and complex systems shape the landscape, site reliability engineering (SRE) practices are evolving to meet the challenges posed by this paradigm shift. The recent SRE Report 2024, a comprehensive survey-based exploration conducted by Catchpoint, provides insights into the dynamic nature of SRE practices and the key considerations influencing the reliability landscape ...

January 02, 2024

No one ever said Site Reliability Engineers (SREs) have it easy. SREs have to deal with ever-increasing amounts of data that is increasingly complex to discover and analyze. Heaps of metrics, logs, traces, and profiling data are also siloed, leading to a fragmented and opaque monitoring toolset to navigate operational efficiency and problem resolution ...

June 29, 2023

Catchpoint's 2024 SRE Survey returns for its sixth consecutive year. Take the survey to participate in the industry's original, genuine and independent global study on all things SRE ...

March 06, 2023

Starting with Site Reliability Engineering (SRE) can be intimidating, but the benefits are more than worth it. Let's go over what it is and all the benefits it can bring to your organization ...

January 11, 2023

As demand for digital services increases and distributed systems become more complex, organizations must collect and process a growing amount of observability data (logs, metrics, and traces). Site reliability engineers (SREs), developers, and security engineers use observability data to learn how their applications and environments are performing so they can successfully respond to issues and mitigate risk ...

December 08, 2022

Industry experts offer thoughtful, insightful, and often controversial predictions on how APM, AIOps, Observability, OpenTelemetry and related technologies will evolve and impact business in 2023. Part 4 covers monitoring, site reliability engineering and ITSM ...

December 01, 2022

You could argue that, until the pandemic, and the resulting shift to hybrid working, delivering flawless customer experiences and improving employee productivity were mutually exclusive activities. Evidence from Catchpoint's recently published Site Reliability Engineering (SRE) industry report suggests this is changing ...

November 07, 2022

Incidents should be your best friend. It sounds like a controversial statement. It sounds like a lot of unnecessary work. The truth is, for companies engaged in delivering any online or digital experience, taking this point of view is absolutely E-S-S-E-N-T-I-A-L ...

November 02, 2022

SRE is now an essential engineering practice for enterprises seeking to accelerate digital transformations to digital-first brands. So how can companies empower SREs and adopt the model across their entire IT organizations to improve digital experiences and ultimately the business? It first starts with addressing the workforce gap and then breaking down team silos ...

September 22, 2022

As we shift further into a digital-first world, where having a reliable online experience becomes more essential, Site Reliability Engineers remain in-demand among organizations of all sizes ... This diverse set of skills and values can be difficult to interview for. In this blog, we'll get you started with some example questions and processes to find your ideal SRE ...

September 07, 2022
For organizations to be successful with SRE, they must also transform the culture and human side within their organization. This cultural shift and new way of thinking must happen across IT and the business. The Global SRE Pulse 2022 report offers a deep look into the state and trends that are shaping SRE now and looking forward. With more than 460 survey responses from SRE professionals at organizations of all sizes, we've identified four top takeaways from the Global SRE Pulse report ...
July 20, 2022

In the last two years, site reliability engineering, more popularly known as SRE, has progressed and matured as both an engineering practice and function. There have been significant changes — not only in terms of tool usage, but also people process changes that begin with a culture or mind-set shift. Cloud-native, microservices-driven architecture has both complicated the discipline, yet enabled us to live an all-digital existence with continuous updates and new capabilities ...

June 27, 2022

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research. In a new survey almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management ...

May 26, 2022

Site reliability engineers are development-focused IT professionals who work on developing and implementing solutions that solve reliability, availability, and scale problems. On the other hand, DevOps engineers are ops-focused workers who solve development pipeline problems. While there is a divide between the two professions, both sets of engineers cross the gap regularly, delivering their expertise and opinions to the other side and vice versa ...

May 25, 2022

Site reliability engineering (SRE) is fast becoming an essential aspect of modern IT operations, particularly in highly scaled, big data environments. As businesses and industries shift to the digital and embrace new IT infrastructures and technologies to remain operational and competitive, the need for a new approach for IT teams to find and manage the balance between launching new systems and features and ensuring these are intuitive, reliable, and friendly for end users has intensified as well ...

April 06, 2022

Years from now, the development community could look back and view this period as the beginning of a golden era, thanks in part to the embrace by business managers of site reliability engineering (SRE) ...

March 17, 2022

Modern IT and security organizations often need to manage petabytes of observability (logs, metrics, traces) data in real time. The adoption of cloud, modern application architectures, Kubernetes, and edge is behind this massive growth in observability data volumes. And for some organizations, log data volumes are approaching the exabyte range ...

February 16, 2022

Site Reliability Engineering (SRE) practice was established by Google nearly 20 years ago and was popularized with Google's monumental SRE Book. Everyone's been attempting to follow that iconic path ever since ...

September 08, 2021

DevOps, SRE and other operations teams use observability solutions with AIOps to ingest and normalize data to get visibility into tech stacks from a centralized system, reduce noise and understand the data's context for quicker mean time to recovery (MTTR). With AI using these processes to produce actionable insights, teams are free to spend more time innovating and providing superior service assurance. Let's explore AI's role in ingestion and normalization, and then dive into correlation and deduplication too ...

July 14, 2021

SREs that fail to deliver customer value run the risk of being stuck in an operational toil rut. Conversely, businesses failing to recognize the bi-modal nature and importance of SRE activities run the risk of losing talented employees and their competitive edge ...

April 15, 2021

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report ...

March 29, 2021

Developers are getting better at building software, but we're not getting better at fixing it. The problem is that fixing bugs and errors is still a very manual process ... That's because traditional observability tools will tell you if your infrastructure is having problems, but don't provide the context a developer needs to fix the code or how to prioritize them based on business requirements. Also, traditional observability tools produce far too much noise and too many false positives, leading to alert fatigue ...

December 08, 2020

In the era of observability, systems across your organization accumulate vast amounts of data about themselves — too much for IT teams to manage at the pace which containerized and cloud IT changes. And as data sources increase, silos emerge in the form of various telemetry and monitoring tools meant to aggregate that telemetry. These systems don't talk to each other, causing alerts to run amok. For SREs, the mental aerobics of correlating these alerts into insights constitutes toil — tedious, manual work spotting, deciphering and resolving events ...

November 05, 2020

During the COVID-19 pandemic, top-tier enterprises were 2.6 times as likely to have grown revenue, 2.5 times as likely to have reached profit goals and 2.1 times as likely to have high employee satisfaction numbers, according to 2020 CIO Survey Report: Adjusting to Remote Work and the New Normal, a new Catchpoint survey ...

September 23, 2020

The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...

Pages