OpenTelemetry — You have probably heard of it. You may already be using parts of it. In this 8-part blog series, posted over the next two weeks, APMdigest will explore OpenTelemetry, with input from a range of experts on the subject.
Many of these experts consider OpenTelemetry — abbreviated as "OTel" — to be the future of performance management. What makes OpenTelemetry so compelling?
First, it combines three different types of highly valuable performance data — tracing, metrics and logs, with possibly more to come — into one universal data collection system.
Second, it has the added appeal of being open source.
And third, the IT industry seems to really be embracing it.
"OpenTelemetry is poised to become integral to DevOps and IT professionals as it represents a lingua franca for observability data about applications and cloud-native infrastructure," says Austin Parker, Head of Developer Relations at Lightstep by ServiceNow.
Marcin "Perk" Stożek, Software Engineering Manager of Open Source Collection, Sumo Logic, adds, "OpenTelemetry is an exciting project which leads the industry to a place where all telemetry data is consistent and interconnected across multiple signal types. Many vendors see that as an opportunity to not reinvent the wheel but rather join forces for the benefit of the users."
"OpenTelemetry promises to be a game changer for DevOps and ITOps teams, enabling organizations to bring all their observability data together in their tools of choice, no matter the application and systems being monitored," Sajai Krishnan, General Manager, Observability, Elastic, confirms. "As a result, DevOps and IT Ops teams can begin to rationalize and consolidate their observability tool sets without sacrificing visibility."
Download the 2022 Gartner Magic Quadrant for APM and Observability
What is OpenTelemetry?
OpenTelemetry is an open source observability framework for cloud native software. It is a collection of tools, APIs and SDKs that can be used to instrument, generate, collect, and export telemetry data for analysis to better understand software performance and behavior.
OpenTelemetry includes the three pillars of observability: traces, metrics and logs.
■ Distributed tracing is a method of tracking the path of a service request from beginning to end across a distributed system.
■ Metrics are the measurement of activities over a period of time, to gain visibility into system or application performance.
■ Logs are text records of events that occur at specific points in time in a system or application.
Each of these data types provides valuable insight into system and application health, ultimately enabling the user to identify and solve performance and availability issues.
But the OpenTelemetry Project does not plan on stopping with the three pillars. "OpenTelemetry will continue to expand, creating standards to capture all types of observability data beyond metrics, logs, and traces," says Krishnan from Elastic. "For example, we are beginning to see benchmarks for profiling data, allowing insight into functions within the CPU."
According to Ben Evans, Senior Principal Software Engineer at Red Hat, the project offers a set of standards, formats, client libraries, and associated software components. The standards are explicitly cross-platform and not tied to any particular technology stack.
"OpenTelemetry provides a framework that integrates with open source and commercial products and can collect observability data from apps written in many languages," Evans said.
Morgan McLean, Director of Product Management at Splunk and Co-Founder of OpenTelemetry, adds: "OpenTelemetry has definitions for every type of signal and metadata, along with how they should be used to track various common behaviors (HTTP request latency, SQL database error rates, Kubernetes pod CPU consumption, etc.). The result of this is that the traces, metrics, logs, etc. that are captured from various systems by OpenTelemetry are all consistent and can be processed, analyzed, and correlated with ease."
The main components of OpenTelemetry include:
■ OpenTelemetry Protocol (OTLP) specification describing the encoding, transport, and delivery mechanism of telemetry data between telemetry sources, intermediate nodes such as collectors, and telemetry backends.
■ OpenTelemetry Collector offering a vendor-agnostic implementation on receiving, processing and exporting telemetry data and removing the need to run, operate, and maintain multiple agents/collectors.
■ APIs and SDKs in 11 different languages, enabling users to easily integrate and extend the project.
One important point to consider, however, is that OpenTelemetry does not include backend storage, analysis and visualization. So users will need to find another tool to provide these capabilities, either through a vendor or by building in-house.
Support from CNCF
The OpenTelemetry project was created through the merger of OpenCensus (originally started by Google) and OpenTracing (originally a CNCF incubating project) in May 2019 and became a Cloud-Native Computing Foundation (CNCF) Sandbox project shortly after. OpenTelemetry became a CNCF incubating project in August 2021.
CNCF is the open source, vendor-neutral hub of cloud-native computing, hosting projects like Kubernetes, Prometheus and OpenTelemetry to make cloud-native universal and sustainable.
"CNCF helped guide the community in merging the overlapping OpenCensus and OpenTracing efforts into one joint effort that became the OpenTelemetry project," says Chris Aniszczyk, CTO of CNCF. "One of the benefits of the CNCF is bringing together industry stakeholders in one place and putting their minds together to do what's best for end users when you have overlapping efforts like this."
"CNCF ranks the project as second most important to Kubernetes, confirming how much the community and vendors see OpenTelemetry as a value add to their teams and customers," adds Martin Thwaites, Developer Advocate at Honeycomb.
Why was OpenTelemetry started, and what is driving the growth and popularity?
The experts agree that OpenTelemetry was developed to fill a gap in performance management in the age of cloud-native and microservices, and to make observability a practical reality.
"The application landscape has significantly evolved over the last few years by moving from a monolithic to microservices architecture with a proliferation of containers to run these applications in a scalable and fault tolerant way," says Nitin Navare, CTO of LogicMonitor. "In this new landscape, ITOps/DevOps teams need to understand how various applications and infrastructure are related to each other and which parts are impacting the end-to-end user experience."
"If you've ever deployed or operated a modern, microservice-based software application, you have no doubt struggled to understand its performance and behavior, and that's because those 'outputs' are usually meager at best," Ben Sigelman, co-creator of OpenTracing, and Morgan McLean from Splunk explained in a blog. "We can't understand a complex system if it's a black box. And the only way to light up those black boxes is with high-quality telemetry: distributed traces, metrics, logs, and more."
"It's important to understand the problem that OpenTelemetry solves," adds Mike Loukides, VP of Emerging Tech Content at O'Reilly Media. "Up until now, most attempts to build software that was observable or even monitorable were ad hoc. People created their own solutions; a few larger projects relied on commercial third-party tools; but nothing fit together well. One microservice's solution was another microservice's nightmare. What happens if your web platform and your database use different libraries for logging and metrics? OpenTelemetry creates a single standard for generating information, communicating that information to receivers, and for receiving the information. That won't put an end to ad hoc solutions, but it does provide a better way forward."
"Performance management is a crucial part of any modern software project," he continues. "In a talk at one of the first O’Reilly Velocity conferences, researchers from Google and Microsoft showed that users would start clicking away from web pages to which they had added very small amounts of latency — hardly enough to be noticeable, but users still went elsewhere. Fifteen or so years later, we're facing the same problem: applications download even more libraries and frameworks; the libraries are larger, and rather than talking to a single threaded web server, applications are querying a system that may consist of hundreds of microservices, all of which have their own performance issues. Understanding performance in that environment is almost a superhuman task. OpenTelemetry standardizes the tools for tracing what the software is actually doing, making it easier to find bottlenecks and performance problems. It isn't a magic bullet — understanding performance on a distributed system is always going to be difficult. But by giving a standard interface to logs, metrics, and trace data, it makes the problem manageable."
"OpenTelemetry offers visibility into even some of the most complex IT environments that previously would have been difficult to monitor, according to Gregg Ostrowski, Executive CTO at Cisco AppDynamics. "By having this holistic view, it's easier for teams to address any issues within the IT stack that could negatively impact performance and the end user experience."
By providing visibility into system and application performance, OpenTelemetry can clearly be a great benefit to both IT Operations and developers. However, several experts point out that while OpenTelemetry may have value for ITOps, the DevOps teams and developers will gain the greatest advantage.
"As a contributor to the project since the creation, I believe OpenTelemetry has already attained importance for DevOps teams who are instrumenting code and using cloud native observability tools," explains Jonah Kowall, CTO of Logz.io. "IT Ops teams see value as well, but less so because they do not instrument applications in code."
Parker of Lightstep agrees, saying standardization of telemetry APIs and data will shift observability left towards application developers, rather than being the sole province of operations — but both groups will benefit from standardized dashboards, metrics, logs and traces.
Go to: A Guide to OpenTelemetry - Part 2: When Will OTel Be Ready?
To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...
APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...
CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...
Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...
Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...