Observability Into Your FinOps: Taking Distributed Tracing Beyond Monitoring
October 18, 2021

Dotan Horovits
Logz.io

Share this

Distributed tracing has been growing in popularity as a primary tool for investigating performance issues in microservices systems. Our recent DevOps Pulse survey shows a 38% increase year-over-year in organizations' tracing use. Furthermore, 64% of those respondents who are not yet using tracing indicated plans to adopt it in the next two years.

However, many organizations have yet to realize just how much potential distributed tracing holds. The fact is, once your application is instrumented, it opens up a whole new world of observability into numerous processes in areas including developer experience, business, and FinOps.


Many articles discuss developer use cases. In this blog, I'd like to venture off and explore the less commonly discussed use cases and the related implications.

Context Propagation: The Secret Sauce Behind Tracing

At the heart of distributed tracing lies the notion of “trace context” and its propagation through the system. This notion is formalized in the W3C Trace Context specification, and takes a central role in OpenTelemetry context propagation, in OpenTracing and other industry standards. Let's go over the main concepts:

Trace context is the data required to move trace information across service boundaries. It is a set of globally unique identifiers that represents the unique request, within which each span exists (spans are the individual operations that comprise the full execution flow of that request).

One great aspect of trace context is that it is not bound to a predefined set of data. This means essentially that you can capture any extra user-defined properties that you'd like to monitor from your application (with the right instrumentation), to provide observability of many types. This user-defined data, sometimes called Baggage, could be the URL of an HTTP request, the SQL statement of a database query, or it could be almost anything really.

Context propagation is the process through which the context is bundled and transferred through your distributed application across threads, components, processes, and services. This is typically accomplished via HTTP headers, following the W3C specification. Your instrumentation libraries (a.k.a. tracers) or auto-instrumentation agents typically take care of the context propagation behind the scenes.

The beauty is that once you've got the plumbing in place to propagate context through your application, it opens up a whole world of additional context that you can collect to support more sophisticated observability. To flesh this out, let's review some interesting use cases from the business and FinOps domain.

Distributed Tracing for Finops and Compliance

Companies living in today's cloud-native world increasingly use shared resources and infrastructure to run their businesses. These resources could include compute, storage, network, or many others. One of the related challenges for these organizations is tracking related resource utilization and attributing it back to the respective business unit or product line. Resource attribution is key for effective FinOps, as it determines the cost structure of a business unit.

Furthermore, in many of today's SaaS business models, operating multi-tenant systems requires the ability to attribute resource costs to tenants. Furthermore, SaaS businesses typically employ rate limiting for each tenant to avoid impacting the service availability levels of other tenants running on the shared resources. Rate-limiting multi-tenant storage, for instance, is said to save cloud vendors hundreds of millions of dollars per year.

Unfortunately, while backend components are aware of low level resource information such as CPU and memory utilization, they typically lack the high-level context about the business or tenant that triggered the request. Yet, by enlisting distributed tracing, the unique identifier (ID) of that business unit, product, or tenant can be propagated down to the backend and infrastructure. Then it's just a matter of aggregating resource utilization figures by that ID to get the per-product (or other business entity) utilization.

Resource attribution can also help with internal capacity planning processes. Understanding how much of a resource was consumed by a given product or business line can help plan any required expansion of the involved infrastructure, aligning it with related business growth targets.

Data privacy compliance is another common issue that organizations face, especially in light of GDPR and CCPA. The frequent problem, as before, is that low level storage is often unaware of user context. Distributed tracing can propagate the user ID from the frontend tier downstream to the backend and data storage tiers so that data access can be verified against it to enforce data privacy policies.

From Common Infrastructure to Common Practice

As more organizations are instrumenting their applications for monitoring purposes, context propagation is becoming a common infrastructure.

The next step in this evolution is moving from use as a common infrastructure to adoption as a common practice. This movement can be influenced not only by the dev and DevOps teams, but also by stakeholders with oversight of business and FinOps. This, in turn, will create more champions for tracing within the organization, in general, which will accelerate adoption and instrumentation efforts throughout additional parts of the involved systems, and with a more diverse set of data.

Once this practice becomes more common, we may reach the point where incentives beyond today's monitoring practices could drive organizations to venture into distributed tracing — incentives that bear direct impact on the company's top or bottom line.

Dotan Horovits is a Product Evangelist at Logz.io
Share this

The Latest

November 23, 2021

The holidays are almost upon us, and retailers are preparing well in advance for the onslaught of online consumers during this compressed period. The Friday following Thanksgiving Day has become the busiest shopping day of the year, and online shopping has never been more robust. But with supply chain disruptions limiting merchandise availability, customer experience will make the difference between clicking the purchase button or typing a competitor's web address ...

November 22, 2021

The 2021 holiday season will be an inflection point: As the economy starts to ramp up again while the country still grapples with the pandemic, holiday shopping will be the most digital holiday season in history by a long shot ... The work must begin months before, as organizations learn from the year prior and take steps to improve experiences and operations, fine-tune systems, plug in new data sources to enrich machine-learning algorithms, move more workloads to the cloud, automate, and experiment with new tech. These efforts culminate in "API Tuesday" ...

November 18, 2021

Most (83%) of nearly 1,500 business and IT decision makers believe that at least 25% of their workforce will remain hybrid post-pandemic, according to the Riverbed | Aternity Hybrid Work Global Survey 2021. While all indicators signal hybrid work environments are the future, most organizations are not fully prepared to deliver a seamless hybrid work experience ...

November 17, 2021

The results of the 2021 BMC Mainframe Survey highlight the consistent positive growth outlook as seen in recent years, with 92 percent of respondents viewing the mainframe as a platform for long-term growth and new workloads, and 86 percent of extra-large shops expecting MIPS (millions of instructions per second) to grow in the coming year. This is not surprising, considering the disruptive nature of the modern digital economy ...

November 16, 2021

With an accelerated push toward digital transformation, organizations everywhere are trying to find ways to work smarter, not harder. A key component of this new model is finding ways to automate business processes — freeing up employees to focus on more strategic, valuable work and improving customers' experiences. Today's enterprise IT leaders have many options to help drive automation initiatives — from digital process automation and artificial intelligence (AI) to enterprise content management and robotic process automation (RPA) ...

November 15, 2021

Most (83%) companies would suffer business damage during the first 24 hours of an outage and thereafter, according to Pivoting to Risk-Driven Security Operations, a report from Netenrich based on a global survey of IT and security professionals ...

November 10, 2021

More than half (60%) of workers said new software had occasionally or frequently frustrated them within the past 24 months, according to a new survey by Gartner ...

November 09, 2021

Everyone laments technical debt like it were a high-interest credit card. But just like how your CFO uses debt as capital for the business, the intelligent Product Manager knows that technical debt can help finance your path to market if you know how to manage it well ...

November 08, 2021

Artificial intelligence (AI) may be the brains, but when the market hears the term "AIOps," it puts automation in the mix. After all, what is the use of knowing without doing? ...

November 03, 2021

How do you ensure your journey to automated IT Ops is streamlined and effective, and not just a buzzword? ...