Skip to main content

Your Observability Stack Has a Telemetry Pipeline Problem

The tool landscape has never been more fragmented - Controlling how telemetry moves between platforms is the new competitive edge for engineering teams
Mike Kelly
Bindplane

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively.

Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money. Every vendor-specific integration creates lock-in. And as AI-powered observability enters the picture, demanding constant, high-quality data streams, the cost of that neglect is about to rise sharply.

The Telemetry Tax Is Real - and Growing

Most engineering teams don't think about observability costs until a cloud invoice forces the conversation. By then, the damage is done: data is being ingested at full fidelity to premium platforms where only a fraction of it ever gets queried. In a February 2026 AWS Builder Center article, Masroor Ahmed states that "roughly 30% to 32% of total cloud spend is wasted on resources that are either oversized or left running when they aren't needed. This means that for every $1 million a company spends, at least $300,000 is vanishing without providing any business value. The movement of observability data is a significant contributor."

Teams that restructure their telemetry pipelines intelligently, routing high-value signals to premium platforms and high-volume, low-priority data to cheaper long-term stores, have reported cost reductions averaging 18% on cloud infrastructure. What's more, APMdigest itself reported, in an article from Splunk, that 57% of observability leaders have successfully reduced costs with OpenTelemetry by gaining control over what telemetry is collected, how it's routed, and where it goes. That's not a rounding error. It's budget that can fund the next platform evaluation or additional headcount.

OpenTelemetry Unlocked the Door. The Pipeline is Still Yours to Build

OpenTelemetry is a genuine step forward. Standardizing on the OpenTelemetry Protocol for metrics, traces, and logs means teams aren't trapped by proprietary SDKs and vendor-specific instrumentation. But OpenTelemetry standardized the signal format — it didn't solve the routing, transformation, and governance challenges that come after data leaves your application.

You still need to decide which signals go to which platforms and at what volume, how to transform schemas to match destination backends, and how to filter noise before it reaches expensive ingestion endpoints. These are pipeline architecture decisions, not tool selection decisions. Most teams are making them ad hoc — hardcoding destination configs, adding one-off integrations, and building brittle pipelines that are painful to modify when the vendor landscape shifts. Given how fast it shifts, that's a meaningful operational liability.

Observability Vendor Lock-In Is the Cost Nobody Budgets For

Lock-in in the observability space doesn't hit you when you sign the contract. It hits you when you try to leave, or when a competing platform offers capabilities your current vendor can't match. Observability vendors make it extremely easy to route everything their way. Their agents and collectors are optimized to funnel data to their ingestion endpoints. When your telemetry pipeline is essentially a direct line from your infrastructure to a single vendor, you're not architecting for flexibility, but trading optionality for short-term simplicity.

An estimated 69% of enterprises use multiple cloud providers specifically to avoid infrastructure lock-in. Engineering teams should apply the same logic to their observability stacks. Organizations getting this right treat telemetry pipelines as programmable infrastructure — vendor-agnostic and capable of routing different signal types to different destinations based on cost, capability, and business need. When a new AIOps platform arrives with ML-based anomaly detection your current vendor can't match, a flexible pipeline means a simple configuration change. A locked pipeline means a months-long integration project.

AI Observability Will Demand More from Your Pipeline

The observability use case for AI is moving in two directions simultaneously. The first is AI-powered observability: platforms using machine learning for anomaly detection, predictive alerting, and automated remediation. These tools often operate on windowed snapshots that get retrained. They need continuously refreshed data to keep baselines current across metrics, traces, logs, and continuous profiling data to build reliable baselines. If your pipeline is lossy or inconsistently filtered upstream, the ML models downstream will reflect that.

The second is the observability of AI systems themselves. As teams deploy models in production, they're responsible for monitoring inference latency, token throughput, model drift, and GPU utilization — none of which map cleanly onto traditional APM signal types. Gartner forecasts worldwide AI spending will reach $2.52 trillion in 2026, a 44% year-over-year increase, with AI infrastructure accounting for most of that figure. Engineering teams that haven't addressed their telemetry pipeline architecture will find themselves managing a new class of observability complexity on top of an already strained foundation.

What Programmable Telemetry Infrastructure Looks Like

Teams that have solved this problem don't think of it as a monitoring problem — they think of it as infrastructure, governed like any other production system. In practice, that means pipelines that route by signal type and destination fit (security logs to a SIEM, high-resolution metrics to a time-series platform, high-volume debug logs to cold storage), transform schemas in-flight so each backend gets data in the shape it expects, absorb backpressure when a destination is unavailable, and enable low-risk platform evaluation by routing a subset of telemetry to a new tool without a full migration.

The goal isn't to reduce observability coverage. It's to make routing decisions at the infrastructure level so each platform receives the signals it's designed to act on, at the cost profile that matches the value it delivers.

The Pipeline Is the Strategy

The APM and observability market will keep consolidating, expanding, and fragmenting. New platforms will emerge. Pricing models will shift. AI-native tools will challenge incumbents. Engineering teams that treat telemetry pipelines as fixed infrastructure will be rearchitecting their observability stacks every time the market moves.

Teams that build pipelines as configurable, vendor-agnostic layers will navigate that landscape differently — moving to new tools without starting from scratch, controlling costs without sacrificing coverage, and feeding AI systems the telemetry they need to function. The data is in motion. The question is whether your pipeline is built to keep up.

Mike Kelly is CEO and Co-Founder of Bindplane

The Latest

AI is becoming the operating system of the enterprise. It acts as an invisible coordination layer that understands intent, connects systems, and executes work across complex SaaS environments. Previously, employees had to click through multiple systems — CRM, ERP, support tools, collaboration platforms — to complete a single task. Now, instead of navigating each application manually, they can simply state what they need to accomplish ...

In 2026, the cost of downtime or an outage is no longer just a technical inconvenience; it's a $600 billion wake up call for global businesses. As our digital ecosystems become  more interconnected, each touchpoint introduces new risks and multiplies the consequences when things go wrong. And the data is clear: aggregate downtime costs  for Global 2,000 companies have surged 50% since 2024, reaching a staggering $600 billion ...

Deloitte found that 74% of enterprises expect to deploy agentic AI solutions in the next 24 months. However, the rush to deployment is outpacing foundational work, though. Only 21% of enterprises have fully formed agent governance models in place. The result? AI agents deployed without guidance or governance begin to function as fragmented islands of complexity ...

Cloud spending is no longer viewed as a passthrough IT expense, but as a strategic financial lever that directly impacts innovation capacity, profitability and enterprise resilience, according to the CFO Cloud Cost Optimization Report from Azul ...

As AI moves from generating responses to performing actions, the need for trust increases exponentially. And as organizations enlist AI agents for increasingly sophisticated business processes, trust is going to be the single most important theme for spurring adoption. What can organizations do to build trustworthy AI agents? ...

I've spent a lot of time in the channel, and one thing I keep coming back to is this: a partner program is only as good as what it looks like in the field. Many programs look great on paper, but when a partner is in front of a customer navigating a complex hybrid environment or trying to make the case for AI-powered observability, the gap between what a vendor promises and what it actually delivers becomes very clear, very fast ...

Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...

For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...

Your Observability Stack Has a Telemetry Pipeline Problem

The tool landscape has never been more fragmented - Controlling how telemetry moves between platforms is the new competitive edge for engineering teams
Mike Kelly
Bindplane

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively.

Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money. Every vendor-specific integration creates lock-in. And as AI-powered observability enters the picture, demanding constant, high-quality data streams, the cost of that neglect is about to rise sharply.

The Telemetry Tax Is Real - and Growing

Most engineering teams don't think about observability costs until a cloud invoice forces the conversation. By then, the damage is done: data is being ingested at full fidelity to premium platforms where only a fraction of it ever gets queried. In a February 2026 AWS Builder Center article, Masroor Ahmed states that "roughly 30% to 32% of total cloud spend is wasted on resources that are either oversized or left running when they aren't needed. This means that for every $1 million a company spends, at least $300,000 is vanishing without providing any business value. The movement of observability data is a significant contributor."

Teams that restructure their telemetry pipelines intelligently, routing high-value signals to premium platforms and high-volume, low-priority data to cheaper long-term stores, have reported cost reductions averaging 18% on cloud infrastructure. What's more, APMdigest itself reported, in an article from Splunk, that 57% of observability leaders have successfully reduced costs with OpenTelemetry by gaining control over what telemetry is collected, how it's routed, and where it goes. That's not a rounding error. It's budget that can fund the next platform evaluation or additional headcount.

OpenTelemetry Unlocked the Door. The Pipeline is Still Yours to Build

OpenTelemetry is a genuine step forward. Standardizing on the OpenTelemetry Protocol for metrics, traces, and logs means teams aren't trapped by proprietary SDKs and vendor-specific instrumentation. But OpenTelemetry standardized the signal format — it didn't solve the routing, transformation, and governance challenges that come after data leaves your application.

You still need to decide which signals go to which platforms and at what volume, how to transform schemas to match destination backends, and how to filter noise before it reaches expensive ingestion endpoints. These are pipeline architecture decisions, not tool selection decisions. Most teams are making them ad hoc — hardcoding destination configs, adding one-off integrations, and building brittle pipelines that are painful to modify when the vendor landscape shifts. Given how fast it shifts, that's a meaningful operational liability.

Observability Vendor Lock-In Is the Cost Nobody Budgets For

Lock-in in the observability space doesn't hit you when you sign the contract. It hits you when you try to leave, or when a competing platform offers capabilities your current vendor can't match. Observability vendors make it extremely easy to route everything their way. Their agents and collectors are optimized to funnel data to their ingestion endpoints. When your telemetry pipeline is essentially a direct line from your infrastructure to a single vendor, you're not architecting for flexibility, but trading optionality for short-term simplicity.

An estimated 69% of enterprises use multiple cloud providers specifically to avoid infrastructure lock-in. Engineering teams should apply the same logic to their observability stacks. Organizations getting this right treat telemetry pipelines as programmable infrastructure — vendor-agnostic and capable of routing different signal types to different destinations based on cost, capability, and business need. When a new AIOps platform arrives with ML-based anomaly detection your current vendor can't match, a flexible pipeline means a simple configuration change. A locked pipeline means a months-long integration project.

AI Observability Will Demand More from Your Pipeline

The observability use case for AI is moving in two directions simultaneously. The first is AI-powered observability: platforms using machine learning for anomaly detection, predictive alerting, and automated remediation. These tools often operate on windowed snapshots that get retrained. They need continuously refreshed data to keep baselines current across metrics, traces, logs, and continuous profiling data to build reliable baselines. If your pipeline is lossy or inconsistently filtered upstream, the ML models downstream will reflect that.

The second is the observability of AI systems themselves. As teams deploy models in production, they're responsible for monitoring inference latency, token throughput, model drift, and GPU utilization — none of which map cleanly onto traditional APM signal types. Gartner forecasts worldwide AI spending will reach $2.52 trillion in 2026, a 44% year-over-year increase, with AI infrastructure accounting for most of that figure. Engineering teams that haven't addressed their telemetry pipeline architecture will find themselves managing a new class of observability complexity on top of an already strained foundation.

What Programmable Telemetry Infrastructure Looks Like

Teams that have solved this problem don't think of it as a monitoring problem — they think of it as infrastructure, governed like any other production system. In practice, that means pipelines that route by signal type and destination fit (security logs to a SIEM, high-resolution metrics to a time-series platform, high-volume debug logs to cold storage), transform schemas in-flight so each backend gets data in the shape it expects, absorb backpressure when a destination is unavailable, and enable low-risk platform evaluation by routing a subset of telemetry to a new tool without a full migration.

The goal isn't to reduce observability coverage. It's to make routing decisions at the infrastructure level so each platform receives the signals it's designed to act on, at the cost profile that matches the value it delivers.

The Pipeline Is the Strategy

The APM and observability market will keep consolidating, expanding, and fragmenting. New platforms will emerge. Pricing models will shift. AI-native tools will challenge incumbents. Engineering teams that treat telemetry pipelines as fixed infrastructure will be rearchitecting their observability stacks every time the market moves.

Teams that build pipelines as configurable, vendor-agnostic layers will navigate that landscape differently — moving to new tools without starting from scratch, controlling costs without sacrificing coverage, and feeding AI systems the telemetry they need to function. The data is in motion. The question is whether your pipeline is built to keep up.

Mike Kelly is CEO and Co-Founder of Bindplane

The Latest

AI is becoming the operating system of the enterprise. It acts as an invisible coordination layer that understands intent, connects systems, and executes work across complex SaaS environments. Previously, employees had to click through multiple systems — CRM, ERP, support tools, collaboration platforms — to complete a single task. Now, instead of navigating each application manually, they can simply state what they need to accomplish ...

In 2026, the cost of downtime or an outage is no longer just a technical inconvenience; it's a $600 billion wake up call for global businesses. As our digital ecosystems become  more interconnected, each touchpoint introduces new risks and multiplies the consequences when things go wrong. And the data is clear: aggregate downtime costs  for Global 2,000 companies have surged 50% since 2024, reaching a staggering $600 billion ...

Deloitte found that 74% of enterprises expect to deploy agentic AI solutions in the next 24 months. However, the rush to deployment is outpacing foundational work, though. Only 21% of enterprises have fully formed agent governance models in place. The result? AI agents deployed without guidance or governance begin to function as fragmented islands of complexity ...

Cloud spending is no longer viewed as a passthrough IT expense, but as a strategic financial lever that directly impacts innovation capacity, profitability and enterprise resilience, according to the CFO Cloud Cost Optimization Report from Azul ...

As AI moves from generating responses to performing actions, the need for trust increases exponentially. And as organizations enlist AI agents for increasingly sophisticated business processes, trust is going to be the single most important theme for spurring adoption. What can organizations do to build trustworthy AI agents? ...

I've spent a lot of time in the channel, and one thing I keep coming back to is this: a partner program is only as good as what it looks like in the field. Many programs look great on paper, but when a partner is in front of a customer navigating a complex hybrid environment or trying to make the case for AI-powered observability, the gap between what a vendor promises and what it actually delivers becomes very clear, very fast ...

Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...

For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...

Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...

Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...