Taming the Cloud Data Surge with Open Source and Observability
June 21, 2022

Dotan Horovits

Share this

Organizations are moving to microservices and cloud native architectures at an increasing pace. The primary incentive for these transformation projects is typically to increase the agility and velocity of software release and product innovation.

These dynamic systems, however, are far more complex to manage and monitor, and they generate far higher data volumes. According to a recent survey conducted by Forrester among infrastructure and cloud monitoring application decision makers, 88% said that they expect their data volume in the cloud to increase over the next two years, with 50% expecting it to grow significantly.

Scaling Cloud Environments Demand Efficient Observability Practices, Forrester, 2022

It’s not just about the quantity but the quality. Over half of the respondents in Forrester’s survey indicated poor data quality is a main challenge for their systems monitoring.

What is this monitoring data anyway?

The common baseline data is the "three pillars of observability", namely logs, metrics and traces. Logs and metrics have been with us in IT systems for many decades, but have experienced a surge with microservice architecture. Many flows that used to be internal within a monolith are now externalized interactions between microservices, producing corresponding logs and metrics for each such interaction and endpoint. The cardinality of the time-series metrics data is also exploding with the newly-introduced dimensions: just think about needing to slice and dice the performance of a workload per endpoint, per node, per pod, and per deployment version, to name just a few.

On top of that, distributed tracing, which used to be a niche tool, is becoming a mandatory component, in order to understand the flow of distributed requests and transactions in the system. In the recent DevOps Pulse survey issued by Logz.io, over 75% of respondents reported plans to deploy tracing in the next 1-3 years. This is not only an impressive percentage in its own right, but is also a sharp increase from the previous DevOps Pulse survey wherein only 65% responded that.

To make matters interesting, bear in mind that there are other signals beyond the traditional "three pillars," such as events and continuous profiling, which introduce additional types of data into the mix.

This data challenge isn’t a technical matter, but rather indicative of the nature of observability. As an industry we’ve been highly focused on the signal types (logs, metrics, traces) each with its own quirks, and have been growing siloed solutions for each signal type. Now it’s time to shift the focus and look at observability as a data analytics problem. Let’s start with the very definition of observability: rather than using the one borrowed from Control Theory, I favor the following definition:

"Observability is the capability to allow a human to ask and answer questions about the system."

Treating observability as a data analytics problem inevitably leads to better support in ad-hoc query capabilities, in better data enrichment and correlation capabilities, and most importantly in taking down the silos and fusing together all the data types and visualizations.

The open source community has been a key enabler for this evolution in observability. In the DevOps Pulse survey, around 40% reported that at least half of their tools are open source. This brings forth a unique opportunity for open source to enable better observability. It’s not just about the tools but, perhaps more importantly, about open standards. Cloud native systems have many moving pieces and telemetry data sources across polyglot microservices as well as multiple third party frameworks and services. This creates a significant challenge on the integration side. Almost half of the respondents in the DevOps Pulse survey indicated turning to open source observability for ease of integration. This is the place where open source shines.

Important projects under the Cloud Native Computing Foundation (CNCF), such as OpenMetrics and OpenTelemetry, offer a standard way for instrumenting applications to emit telemetry data, a standard format of exposing and transmitting the data, and a standard means for collecting that data. Unlike traditional logs, for example, which have traditionally been text based and unstructured, essentially the developer writing "notes to self" or for his teammates to decipher, the new formats are geared towards scalable machine analytics. This means well structured data, with strong typing and machine readable formats such as JSON and Protobuf.

More than three in four decision makers are increasing their use of cloud-native architectures like multi cloud workloads, serverless workloads, and workloads using containers. As the adoption grows, the data volumes and data-to-noise ratio will increase. It’s time to converge the industry around leading open standards and adopt data analytics practices for mastering that data, so that we can effectively monitor these systems.

Dotan Horovits is Principal Developer Advocate at Logz.io
Share this

The Latest

February 06, 2023

This year 2023, at a macro level we are moving from an inflation economy to a recession and uncertain economy and the general theme is certainly going to be "Doing More with Less" and "Customer Experience is the King." Let us examine what trends and technologies will play a lending hand in these circumstances ...

February 02, 2023

As organizations continue to adapt to a post-pandemic surge in cloud-based productivity, the 2023 State of the Network report from Viavi Solutions details how end-user awareness remains critical and explores the benefits — and challenges — of cloud and off-premises network modernization initiatives ...

February 01, 2023

In the network engineering world, many teams have yet to realize the immense benefit real-time collaboration tools can bring to a successful automation strategy. By integrating a collaboration platform into a network automation strategy — and taking advantage of being able to share responses, files, videos and even links to applications and device statuses — network teams can leverage these tools to manage, monitor and update their networks in real time, and improve the ways in which they manage their networks ...

January 31, 2023

A recent study revealed only an alarming 5% of IT decision makers who report having complete visibility into employee adoption and usage of company-issued applications, demonstrating they are often unknowingly careless when it comes to software investments that can ultimately be costly in terms of time and resources ...

January 30, 2023

Everyone has visibility into their multi-cloud networking environment, but only some are happy with what they see. Unfortunately, this continues a trend. According to EMA's latest research, most network teams have some end-to-end visibility across their multi-cloud networks. Still, only 23.6% are fully satisfied with their multi-cloud network monitoring and troubleshooting capabilities ...

January 26, 2023

As enterprises work to implement or improve their observability practices, tool sprawl is a very real phenomenon ... Tool sprawl can and does happen all across the organization. In this post, though, we'll focus specifically on how and why observability efforts often result in tool sprawl, some of the possible negative consequences of that sprawl, and we'll offer some advice on how to reduce or even avoid sprawl ...

January 25, 2023

As companies generate more data across their network footprints, they need network observability tools to help find meaning in that data for better decision-making and problem solving. It seems many companies believe that adding more tools leads to better and faster insights ... And yet, observability tools aren't meeting many companies' needs. In fact, adding more tools introduces new challenges ...

January 24, 2023

Driven by the need to create scalable, faster, and more agile systems, businesses are adopting cloud native approaches. But cloud native environments also come with an explosion of data and complexity that makes it harder for businesses to detect and remediate issues before everything comes to a screeching halt. Observability, if done right, can make it easier to mitigate these challenges and remediate incidents before they become major customer-impacting problems ...

January 23, 2023

The spiraling cost of energy is forcing public cloud providers to raise their prices significantly. A recent report by Canalys predicted that public cloud prices will jump by around 20% in the US and more than 30% in Europe in 2023. These steep price increases will test the conventional wisdom that moving to the cloud is a cheap computing alternative ...

January 19, 2023

Despite strong interest over the past decade, the actual investment in DX has been recent. While 100% of enterprises are now engaged with DX in some way, most (77%) have begun their DX journey within the past two years. And most are early stage, with a fourth (24%) at the discussion stage and half (49%) currently transforming. Only 27% say they have finished their DX efforts ...