Skip to main content

APM and Observability: Cutting Through the Confusion — Part 9

Pete Goldin
APMdigest

The story of the evolution of Observability to encompass APM and other IT performance management capabilities would not be complete without discussing the monumental impact of open source.

Start with: APM and Observability - Cutting Through the Confusion - Part 8

Open source is transforming how organizations approach APM and observability by providing vendor neutral standards for collecting and exporting telemetry types, says Mimi Shalash, Observability Advisor at Splunk, a Cisco Company.

Solutions like OpenTelemetry simplify integration across platforms, reduce vendor lock-in, and improve interoperability in complex environments, Shalash continues. Prometheus enhances this approach with robust metrics and alerting, especially systems like Kubernetes. And together these tools enable flexible, cost-effective stacks designed to scale and evolve with modern infrastructure.

“Open source tools like OpenTelemetry and Prometheus are becoming essential building blocks for observability in modern, cloud-native environments,” explains Andreas Grabner, Fellow DevRel and CNCF Ambassador, Dynatrace. “They empower organizations with greater flexibility and standardization in how telemetry data is collected. The broader industry trend is moving toward interoperability and data unification—using open standards for collection while relying on more advanced platforms to contextualize, analyze and act on that data at scale. This hybrid model allows teams to preserve their existing investments in open source while benefiting from automation, AI and enterprise grade observability.”

“The observability space is a prime target for OSS,” Sven Delmas, VP of Research at Mezmo, agrees. “Between dealing with a tech-savvy and curious audience, constant pressure on cost control, and the need for transparency and avoiding vendor lock-in, there has been — and will be — an ever-increasing push to OSS.”

Driving Observability's Evolution

Open source is changing the center of gravity in observability from tools to telemetry, according to Brian Douglas, Head of Ecosystem, Cloud Native Computing Foundation (CNCF). Developers are adopting Prometheus, OpenTelemetry, and Fluent Bit not just because they're free or flexible, but because they represent an open, portable foundation. These tools make it easier to switch vendors, build internal platforms, and innovate on top of shared standards. They're not just part of the observability conversation; they're shaping the future of how observability is defined.

APM is one specific implementation of observability, not its full scope, Douglas continues. It answers questions like, 'Is this app performing within expected parameters?' Observability, in contrast, supports deeper exploration: 'Why did latency spike in a downstream service for certain regions?' Projects like Prometheus and OpenTelemetry enable this broader context by collecting high-dimensional metrics, distributed traces, and logs which gives teams the raw, interoperable data needed to connect the dots.”

Observability supports cross-signal correlation and open-ended investigation, Douglas adds. Rather than focusing solely on applications, it lets teams visualize the full stack, from container runtimes and infrastructure to network topology and business-level SLIs.

  • Prometheus provides robust, flexible metrics, while Cortex scales them across environments.
  • Fluent Bit and Fluentd handle log aggregation and routing across edge and core environments.
  • OpenTelemetry standardizes telemetry collection and enriches it with context, making it easier for tools and teams to interoperate without reinventing the wheel.

“What's important is interoperability,” Douglas of CNCF explains. “With standards like OpenTelemetry and protocols like the Prometheus exposition format, teams can adopt a modular approach: instrument once, analyze anywhere. This lets them use best-in-class components rather than be locked into a monolithic solution. Observability isn't a single tool, it's a strategy backed by open, composable tooling.”

With OpenTelemetry, users can build a composable observability stack where each tool plays to its strengths: one might excel at exploratory debugging, another at automated root cause analysis, and a third at cost-effective long-term storage, Severin Neumann, Head of Community & Developer Relations at Causely, elaborates. This flexibility lets teams get the best outcomes for their specific needs without duplicating instrumentation or locking themselves into a one-size-fits-all solution.

OpenTelemetry: Reshaping APM and Observability

OpenTelemetry, in particular, has already profoundly reshaped the APM market and the broader observability field, explains Juraci Paixão Kröhling, Software Engineer at OllyGarden. “Initially, some established players might have overlooked it, but strong customer demand has made OpenTelemetry support almost table stakes now; it's rare to find a vendor unable to ingest the standard OTLP format.”

OpenTelemetry is an open source standard, framework and suite of tools facilitating the generation, collection, and exporting of telemetry data.

“OpenTelemetry is having a huge impact on the industry with studies showing that nearly half of organizations polled are using OpenTelemetry with another 25-percent-plus looking to adopt in the near term,” says Harald Burose, Director, Product Management, Research & Development – Engineering, OpenText.

Download the EMA Report: Taking Observability to the Next Level - OpenTelemetry’s Emerging Role in IT Performance and Reliability

Kröhling from OllyGarden continues, “I expect vendors will increasingly embrace OpenTelemetry more natively, treating its semantic conventions not just as data points but as first-class citizens for richer understanding. The era of requiring proprietary agents for basic data collection is closing; customers now expect tools not only to handle open formats but to do so meaningfully, respecting the common language defined by standards like OpenTelemetry. This shared foundation allows everyone to cultivate better systems.”

OpenTelemetry provides teams with greater flexibility, standardization, and control over their telemetry data and has become the de facto standard for data ingestion, according to Bahubali Shetti, Senior Director, Product Marketing, Elastic. Whether deployments use standard OTel SDKs, auto-instrumentation, OTel Collectors, or a combination of these, users can avoid vendor lock-in and reduce the need for future retooling.

“OpenTelemetry isn't just shaping the future of observability, it's quickly becoming the standard that modern, scalable systems are built on,” concludes Shalash from Splunk.

Go to: APM and Observability: Cutting Through the Confusion — Part 10, discussing AI's impact on APM and Observability.

Pete Goldin is Editor and Publisher of APMdigest

The Latest

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

The quietest week your engineering team has ever had might also be its best. No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?" ... I've spent many years in engineering, and this pattern keeps repeating ...

APM and Observability: Cutting Through the Confusion — Part 9

Pete Goldin
APMdigest

The story of the evolution of Observability to encompass APM and other IT performance management capabilities would not be complete without discussing the monumental impact of open source.

Start with: APM and Observability - Cutting Through the Confusion - Part 8

Open source is transforming how organizations approach APM and observability by providing vendor neutral standards for collecting and exporting telemetry types, says Mimi Shalash, Observability Advisor at Splunk, a Cisco Company.

Solutions like OpenTelemetry simplify integration across platforms, reduce vendor lock-in, and improve interoperability in complex environments, Shalash continues. Prometheus enhances this approach with robust metrics and alerting, especially systems like Kubernetes. And together these tools enable flexible, cost-effective stacks designed to scale and evolve with modern infrastructure.

“Open source tools like OpenTelemetry and Prometheus are becoming essential building blocks for observability in modern, cloud-native environments,” explains Andreas Grabner, Fellow DevRel and CNCF Ambassador, Dynatrace. “They empower organizations with greater flexibility and standardization in how telemetry data is collected. The broader industry trend is moving toward interoperability and data unification—using open standards for collection while relying on more advanced platforms to contextualize, analyze and act on that data at scale. This hybrid model allows teams to preserve their existing investments in open source while benefiting from automation, AI and enterprise grade observability.”

“The observability space is a prime target for OSS,” Sven Delmas, VP of Research at Mezmo, agrees. “Between dealing with a tech-savvy and curious audience, constant pressure on cost control, and the need for transparency and avoiding vendor lock-in, there has been — and will be — an ever-increasing push to OSS.”

Driving Observability's Evolution

Open source is changing the center of gravity in observability from tools to telemetry, according to Brian Douglas, Head of Ecosystem, Cloud Native Computing Foundation (CNCF). Developers are adopting Prometheus, OpenTelemetry, and Fluent Bit not just because they're free or flexible, but because they represent an open, portable foundation. These tools make it easier to switch vendors, build internal platforms, and innovate on top of shared standards. They're not just part of the observability conversation; they're shaping the future of how observability is defined.

APM is one specific implementation of observability, not its full scope, Douglas continues. It answers questions like, 'Is this app performing within expected parameters?' Observability, in contrast, supports deeper exploration: 'Why did latency spike in a downstream service for certain regions?' Projects like Prometheus and OpenTelemetry enable this broader context by collecting high-dimensional metrics, distributed traces, and logs which gives teams the raw, interoperable data needed to connect the dots.”

Observability supports cross-signal correlation and open-ended investigation, Douglas adds. Rather than focusing solely on applications, it lets teams visualize the full stack, from container runtimes and infrastructure to network topology and business-level SLIs.

  • Prometheus provides robust, flexible metrics, while Cortex scales them across environments.
  • Fluent Bit and Fluentd handle log aggregation and routing across edge and core environments.
  • OpenTelemetry standardizes telemetry collection and enriches it with context, making it easier for tools and teams to interoperate without reinventing the wheel.

“What's important is interoperability,” Douglas of CNCF explains. “With standards like OpenTelemetry and protocols like the Prometheus exposition format, teams can adopt a modular approach: instrument once, analyze anywhere. This lets them use best-in-class components rather than be locked into a monolithic solution. Observability isn't a single tool, it's a strategy backed by open, composable tooling.”

With OpenTelemetry, users can build a composable observability stack where each tool plays to its strengths: one might excel at exploratory debugging, another at automated root cause analysis, and a third at cost-effective long-term storage, Severin Neumann, Head of Community & Developer Relations at Causely, elaborates. This flexibility lets teams get the best outcomes for their specific needs without duplicating instrumentation or locking themselves into a one-size-fits-all solution.

OpenTelemetry: Reshaping APM and Observability

OpenTelemetry, in particular, has already profoundly reshaped the APM market and the broader observability field, explains Juraci Paixão Kröhling, Software Engineer at OllyGarden. “Initially, some established players might have overlooked it, but strong customer demand has made OpenTelemetry support almost table stakes now; it's rare to find a vendor unable to ingest the standard OTLP format.”

OpenTelemetry is an open source standard, framework and suite of tools facilitating the generation, collection, and exporting of telemetry data.

“OpenTelemetry is having a huge impact on the industry with studies showing that nearly half of organizations polled are using OpenTelemetry with another 25-percent-plus looking to adopt in the near term,” says Harald Burose, Director, Product Management, Research & Development – Engineering, OpenText.

Download the EMA Report: Taking Observability to the Next Level - OpenTelemetry’s Emerging Role in IT Performance and Reliability

Kröhling from OllyGarden continues, “I expect vendors will increasingly embrace OpenTelemetry more natively, treating its semantic conventions not just as data points but as first-class citizens for richer understanding. The era of requiring proprietary agents for basic data collection is closing; customers now expect tools not only to handle open formats but to do so meaningfully, respecting the common language defined by standards like OpenTelemetry. This shared foundation allows everyone to cultivate better systems.”

OpenTelemetry provides teams with greater flexibility, standardization, and control over their telemetry data and has become the de facto standard for data ingestion, according to Bahubali Shetti, Senior Director, Product Marketing, Elastic. Whether deployments use standard OTel SDKs, auto-instrumentation, OTel Collectors, or a combination of these, users can avoid vendor lock-in and reduce the need for future retooling.

“OpenTelemetry isn't just shaping the future of observability, it's quickly becoming the standard that modern, scalable systems are built on,” concludes Shalash from Splunk.

Go to: APM and Observability: Cutting Through the Confusion — Part 10, discussing AI's impact on APM and Observability.

Pete Goldin is Editor and Publisher of APMdigest

The Latest

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Technology management is evolving, and in turn, so is the scope of FinOps. The FinOps Foundation recently updated their mission statement from "advancing the people who manage the value of cloud" to "advancing the people who manage the value of technology." This seemingly small change solidifies a larger evolution: FinOps practitioners have organically expanded to be focused on more than just cloud cost optimization. Today, FinOps teams are largely — and quickly — expanding their job descriptions, evolving into a critical function for managing the full value of technology ...

Enterprises are under pressure to scale AI quickly. Yet despite considerable investment, adoption continues to stall. One of the most overlooked reasons is vendor sprawl ... In reality, no organization deliberately sets out to create sprawling vendor ecosystems. More often, complexity accumulates over time through well-intentioned initiatives, such as enterprise-wide digital transformation efforts, point solutions, or decentralized sourcing strategies ...

Nearly every conversation about AI eventually circles back to compute. GPUs dominate the headlines while cloud platforms compete for workloads and model benchmarks drive investment decisions. But underneath that noise, a quieter infrastructure challenge is taking shape. The real bottleneck in enterprise AI is not processing power, it is the ability to store, manage and retrieve the relentless volumes of data that AI systems generate, consume and multiply ...

The 2026 Observability Survey from Grafana Labs paints a vivid picture of an industry maturing fast, where AI is welcomed with careful conditions, SaaS economics are reshaping spending decisions, complexity remains a defining challenge, and open standards continue to underpin it all ...

The observability industry has an evolving relationship with AI. We're not skeptics, but it's clear that trust in AI must be earned ... In Grafana Labs' annual Observability Survey, 92% said they see real value in AI surfacing anomalies before they cause downtime. Another 91% endorsed AI for forecasting and root cause analysis. So while the demand is there, customers need it to be trustworthy, as the survey also found that the practitioners most enthusiastic about AI are also the most insistent on explainability ...

In the modern enterprise, the conversation around AI has moved past skepticism toward a stage of active adoption. According to our 2026 State of IT Trends Report: The Human Side of Autonomous AI, nearly 90% of IT professionals view AI as a net positive, and this optimism is well-founded. We are seeing agentic AI move beyond simple automation to actively streamlining complex data insights and eliminating the manual toil that has long hindered innovation. However, as we integrate these autonomous agents into our ecosystems, the fundamental DNA of the IT role is evolving ...

AI workloads require an enormous amount of computing power ... What's also becoming abundantly clear is just how quickly AI's computing needs are leading to enterprise systems failure. According to Cockroach Labs' State of AI Infrastructure 2026 report, enterprise systems are much closer to failure than their organizations realize. The report ... suggests AI scale could cause widespread failures in as little as one year — making it a clear risk for business performance and reliability.

The quietest week your engineering team has ever had might also be its best. No alarms going off. No escalations. No frantic Teams or Slack threads at 2 a.m. Everything humming along exactly as it should. And somewhere in a leadership meeting, someone looks at the metrics dashboard, sees a flat line of incidents and says: "Seems like things are pretty calm over there. Do we really need all those people?" ... I've spent many years in engineering, and this pattern keeps repeating ...