Observability Into Your FinOps: Taking Distributed Tracing Beyond Monitoring
October 18, 2021

Dotan Horovits
Logz.io

Share this

Distributed tracing has been growing in popularity as a primary tool for investigating performance issues in microservices systems. Our recent DevOps Pulse survey shows a 38% increase year-over-year in organizations' tracing use. Furthermore, 64% of those respondents who are not yet using tracing indicated plans to adopt it in the next two years.

However, many organizations have yet to realize just how much potential distributed tracing holds. The fact is, once your application is instrumented, it opens up a whole new world of observability into numerous processes in areas including developer experience, business, and FinOps.


Many articles discuss developer use cases. In this blog, I'd like to venture off and explore the less commonly discussed use cases and the related implications.

Context Propagation: The Secret Sauce Behind Tracing

At the heart of distributed tracing lies the notion of “trace context” and its propagation through the system. This notion is formalized in the W3C Trace Context specification, and takes a central role in OpenTelemetry context propagation, in OpenTracing and other industry standards. Let's go over the main concepts:

Trace context is the data required to move trace information across service boundaries. It is a set of globally unique identifiers that represents the unique request, within which each span exists (spans are the individual operations that comprise the full execution flow of that request).

One great aspect of trace context is that it is not bound to a predefined set of data. This means essentially that you can capture any extra user-defined properties that you'd like to monitor from your application (with the right instrumentation), to provide observability of many types. This user-defined data, sometimes called Baggage, could be the URL of an HTTP request, the SQL statement of a database query, or it could be almost anything really.

Context propagation is the process through which the context is bundled and transferred through your distributed application across threads, components, processes, and services. This is typically accomplished via HTTP headers, following the W3C specification. Your instrumentation libraries (a.k.a. tracers) or auto-instrumentation agents typically take care of the context propagation behind the scenes.

The beauty is that once you've got the plumbing in place to propagate context through your application, it opens up a whole world of additional context that you can collect to support more sophisticated observability. To flesh this out, let's review some interesting use cases from the business and FinOps domain.

Distributed Tracing for Finops and Compliance

Companies living in today's cloud-native world increasingly use shared resources and infrastructure to run their businesses. These resources could include compute, storage, network, or many others. One of the related challenges for these organizations is tracking related resource utilization and attributing it back to the respective business unit or product line. Resource attribution is key for effective FinOps, as it determines the cost structure of a business unit.

Furthermore, in many of today's SaaS business models, operating multi-tenant systems requires the ability to attribute resource costs to tenants. Furthermore, SaaS businesses typically employ rate limiting for each tenant to avoid impacting the service availability levels of other tenants running on the shared resources. Rate-limiting multi-tenant storage, for instance, is said to save cloud vendors hundreds of millions of dollars per year.

Unfortunately, while backend components are aware of low level resource information such as CPU and memory utilization, they typically lack the high-level context about the business or tenant that triggered the request. Yet, by enlisting distributed tracing, the unique identifier (ID) of that business unit, product, or tenant can be propagated down to the backend and infrastructure. Then it's just a matter of aggregating resource utilization figures by that ID to get the per-product (or other business entity) utilization.

Resource attribution can also help with internal capacity planning processes. Understanding how much of a resource was consumed by a given product or business line can help plan any required expansion of the involved infrastructure, aligning it with related business growth targets.

Data privacy compliance is another common issue that organizations face, especially in light of GDPR and CCPA. The frequent problem, as before, is that low level storage is often unaware of user context. Distributed tracing can propagate the user ID from the frontend tier downstream to the backend and data storage tiers so that data access can be verified against it to enforce data privacy policies.

From Common Infrastructure to Common Practice

As more organizations are instrumenting their applications for monitoring purposes, context propagation is becoming a common infrastructure.

The next step in this evolution is moving from use as a common infrastructure to adoption as a common practice. This movement can be influenced not only by the dev and DevOps teams, but also by stakeholders with oversight of business and FinOps. This, in turn, will create more champions for tracing within the organization, in general, which will accelerate adoption and instrumentation efforts throughout additional parts of the involved systems, and with a more diverse set of data.

Once this practice becomes more common, we may reach the point where incentives beyond today's monitoring practices could drive organizations to venture into distributed tracing — incentives that bear direct impact on the company's top or bottom line.

Dotan Horovits is a Product Evangelist at Logz.io
Share this

The Latest

May 19, 2022

Recently, a regional healthcare organization wanted to retire its legacy monitoring tools and adopt AIOps. The organization asked Windward Consulting to implement an AIOps strategy that would help streamline its outdated and unwieldy IT system management. Our team's AIOps implementation process helped this client and can help others in the industry too. Here's what my team did ...

May 18, 2022

You've likely heard it before: every business is a digital business. However, some businesses and sectors digitize more quickly than others. Healthcare has traditionally been on the slower side of digital transformation and technology adoption, but that's changing. As healthcare organizations roll out innovations at increasing velocity, they must build a long-term strategy for how they will maintain the uptime of their critical apps and services. And there's only one tool that can ensure this continuous availability in our modern IT ecosystems. AIOps can help IT Operations teams ensure the uptime of critical apps and services ...

May 17, 2022

Between 2012 to 2015 all of the hyperscalers attempted to use the legacy APM solutions to improve their own visibility. To no avail. The problem was that none of the previous generations of APM solutions could match the scaling demand, nor could they provide interoperability due to their proprietary and exclusive agentry ...

May 16, 2022

The DevOps journey begins by understanding a team's DevOps flow and identifying precisely what tasks deliver the best return on engineers' time when automated. The rest of this blog will help DevOps team managers by outlining what jobs can — and should be automated ...

May 12, 2022

A survey from Snow Software polled more than 500 IT leaders to determine the current state of cloud infrastructure. Nearly half of the IT leaders who responded agreed that cloud was critical to operations during the pandemic with the majority deploying a hybrid cloud strategy consisting of both public and private clouds. Unsurprisingly, over the last 12 months, the majority of respondents had increased overall cloud spend — a substantial increase over the 2020 findings ...

May 11, 2022

As we all know, the drastic changes in the world have caused the workforce to take a hybrid approach over the last two years. A lot of that time, being fully remote. With the back and forth between home and office, employees need ways to stay productive and access useful information necessary to complete their daily work. The ability to obtain a holistic view of data relevant to the user and get answers to topics, no matter the worker's location, is crucial for a successful and efficient hybrid working environment ...

May 10, 2022

For the past decade, Application Performance Management has been a capability provided by a very small and exclusive set of vendors. These vendors provided a bolt-on solution that provided monitoring capabilities without requiring developers to take ownership of instrumentation and monitoring. You may think of this as a benefit, but in reality, it was not ...

May 05, 2022

Mohan Kompella, VP of Product Marketing at BigPanda, answers the question: How can AIOps contribute to greater achievement in global enterprises? ...

May 04, 2022

Increasingly, more and more software is being delivered as software as a service (SaaS). Gartner forecasts the SaaS market to continue to expand to $145B in 2022. Consumers and businesses not only have become accustomed to, but also expect SaaS-based solutions, even more so in the post-COVID world. This new frontier allows features to be rolled out at an unparalleled velocity paving the way for continuous innovation and sustainable competitive advantages ...