Skip to main content

Challenges and Trends in Observability Adoption 2024

Dotan Horovits
Logz.io

Organizations recognize the value of observability, but only 10% of them are actually practicing full observability of their applications and infrastructure. This is among the key findings from the recently completed Logz.io 2024 Observability Pulse Survey and Report.

According to the survey, for the third year in a row mean time to recovery (MTTR) is increasing; taking over an hour for 82% of 2024 respondents (up from 74% in 2023, 64% in 2022, and 47% in 2021). Clearly, whatever organizations are doing is not enough to resolve their production issues or reach their SLOs efficiently.


As previously mentioned, 10% of organizations that recognize the value of observability are actually practicing full observability: that is most certainly a low number. But we found that 60% of teams that are increasing their focus on observability are reporting improved and accelerated troubleshooting.

So why aren't more organizations prioritizing a strong observability strategy?

Challenges to Full Observability

One complicating factor is the increasing volume of tools and data. This may add to the complexity of a successful observability plan, but the expertise of the people deploying the plan is the biggest issue according to the survey. Lack of knowledge on the team ranked as the top challenge as the tech talent gap is impacting 48% of survey respondents.

Not surprisingly, costs are a primary concern for organizations — 91% of respondents, in fact. As they move toward full observability of their systems, data volume is multiplying, especially for those running Kubernetes in production. Monitoring and troubleshooting their Kubernetes clusters was the top challenge for 40% of respondents deploying them.

Organizations are responding to this huge increase in data and the subsequent expense of that data by adapting their observability practices to keep costs down. Exploring ways to gain better visibility into monitoring costs (52%) and working to optimize the volume of monitoring data (37%) are tactics being used in an effort to reduce observability costs.

Trends in Observability

The survey revealed some noteworthy trends in the tools being used and the approach being taken to reduce MTTR.

Consolidating services appears to be on the rise, and simplifying environments could be a way to improve MTTR. With this strategy, 28% of organizations surveyed are embracing a shared model for observability and security monitoring, a 13% increase over last year.

The big news here, however, is that 87% of respondents said they are using some form of a Platform Engineering model with 10% saying it's in the works. With Platform Engineering, a single group enables observability for all involved teams. Platform Engineering is definitely a trend that is on the rise industry-wide.

Other trends revealed are the use of data pipeline analytics as a means to address observability costs and complexity; this was noted by 75% of survey respondents. In terms of the tools being used, the majority of organizations are using between 1 and 5 observability tools currently. OpenTelemetry adoption is increasing, with 76% of respondents using the open source project as a framework to assist in generating and capturing telemetry data for their cloud-native software.

Grafana and Prometheus were the top two open source systems, 43% and 38% respectively, chosen for observability. Although it's important to note that in 2024, 21% of respondents said they have consolidated to one tool, up from 16% last year. This is an interesting trend we're definitely keeping an eye on and are happy to be a part of.

As organizations continue to adopt cloud-native technologies and face growing complexity paired with skyrocketing costs, unified, business-centric observability is becoming a must-have strategy for not only ensuring the smooth operation of their applications and infrastructure, but for meeting service level objectives (SLOs) that impact the bottom line.

Methodology: This is our sixth year running this survey (previously named the DevOps Pulse Survey) in which we engaged with 500 respondents about their observability journey. Developers, DevOps engineers, IT professionals, and executives from around the globe all chimed in to give us a glimpse into their organizations' observability efforts; the goals, the challenges, and the realities.

Dotan Horovits is Principal Developer Advocate at Logz.io

The Latest

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Technology leaders across the federal landscape are facing, and will continue to face, an uphill battle when it comes to fortifying their digital environments against hostile and persistent threat actors. On one hand, they are being asked to push digital transformation ... On the other hand, they are facing the fiscal uncertainty of continuing resolutions (CR) and government shutdowns looming near and far. In the face of these challenges, CIOs, CTOs, and CISOs must figure out how to modernize legacy systems and infrastructure while doing more with less and still defending against external and internal threats ...

Reliability is no longer proven by uptime alone, according to the The SRE Report 2026 from LogicMonitor. In the AI era, it is experienced through speed, consistency, and user trust, and increasingly judged by business impact. As digital services grow more complex and AI systems move into production, traditional monitoring approaches are struggling to keep pace, increasing the need for AI-first observability that spans applications, infrastructure, and the Internet ...

If AI is the engine of a modern organization, then data engineering is the road system beneath it. You can build the most powerful engine in the world, but without paved roads, traffic signals, and bridges that can support its weight, it will stall. In many enterprises, the engine is ready. The roads are not ...

In the world of digital-first business, there is no tolerance for service outages. Businesses know that outages are the quickest way to lose money and customers. For smaller organizations, unplanned downtime could even force the business to close ... A new study from PagerDuty, The State of AI-First Operations, reveals that companies actively incorporating AI into operations now view operational resilience as a growth driver rather than a cost center. But how are they achieving it? ...

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ... 

Challenges and Trends in Observability Adoption 2024

Dotan Horovits
Logz.io

Organizations recognize the value of observability, but only 10% of them are actually practicing full observability of their applications and infrastructure. This is among the key findings from the recently completed Logz.io 2024 Observability Pulse Survey and Report.

According to the survey, for the third year in a row mean time to recovery (MTTR) is increasing; taking over an hour for 82% of 2024 respondents (up from 74% in 2023, 64% in 2022, and 47% in 2021). Clearly, whatever organizations are doing is not enough to resolve their production issues or reach their SLOs efficiently.


As previously mentioned, 10% of organizations that recognize the value of observability are actually practicing full observability: that is most certainly a low number. But we found that 60% of teams that are increasing their focus on observability are reporting improved and accelerated troubleshooting.

So why aren't more organizations prioritizing a strong observability strategy?

Challenges to Full Observability

One complicating factor is the increasing volume of tools and data. This may add to the complexity of a successful observability plan, but the expertise of the people deploying the plan is the biggest issue according to the survey. Lack of knowledge on the team ranked as the top challenge as the tech talent gap is impacting 48% of survey respondents.

Not surprisingly, costs are a primary concern for organizations — 91% of respondents, in fact. As they move toward full observability of their systems, data volume is multiplying, especially for those running Kubernetes in production. Monitoring and troubleshooting their Kubernetes clusters was the top challenge for 40% of respondents deploying them.

Organizations are responding to this huge increase in data and the subsequent expense of that data by adapting their observability practices to keep costs down. Exploring ways to gain better visibility into monitoring costs (52%) and working to optimize the volume of monitoring data (37%) are tactics being used in an effort to reduce observability costs.

Trends in Observability

The survey revealed some noteworthy trends in the tools being used and the approach being taken to reduce MTTR.

Consolidating services appears to be on the rise, and simplifying environments could be a way to improve MTTR. With this strategy, 28% of organizations surveyed are embracing a shared model for observability and security monitoring, a 13% increase over last year.

The big news here, however, is that 87% of respondents said they are using some form of a Platform Engineering model with 10% saying it's in the works. With Platform Engineering, a single group enables observability for all involved teams. Platform Engineering is definitely a trend that is on the rise industry-wide.

Other trends revealed are the use of data pipeline analytics as a means to address observability costs and complexity; this was noted by 75% of survey respondents. In terms of the tools being used, the majority of organizations are using between 1 and 5 observability tools currently. OpenTelemetry adoption is increasing, with 76% of respondents using the open source project as a framework to assist in generating and capturing telemetry data for their cloud-native software.

Grafana and Prometheus were the top two open source systems, 43% and 38% respectively, chosen for observability. Although it's important to note that in 2024, 21% of respondents said they have consolidated to one tool, up from 16% last year. This is an interesting trend we're definitely keeping an eye on and are happy to be a part of.

As organizations continue to adopt cloud-native technologies and face growing complexity paired with skyrocketing costs, unified, business-centric observability is becoming a must-have strategy for not only ensuring the smooth operation of their applications and infrastructure, but for meeting service level objectives (SLOs) that impact the bottom line.

Methodology: This is our sixth year running this survey (previously named the DevOps Pulse Survey) in which we engaged with 500 respondents about their observability journey. Developers, DevOps engineers, IT professionals, and executives from around the globe all chimed in to give us a glimpse into their organizations' observability efforts; the goals, the challenges, and the realities.

Dotan Horovits is Principal Developer Advocate at Logz.io

The Latest

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Technology leaders across the federal landscape are facing, and will continue to face, an uphill battle when it comes to fortifying their digital environments against hostile and persistent threat actors. On one hand, they are being asked to push digital transformation ... On the other hand, they are facing the fiscal uncertainty of continuing resolutions (CR) and government shutdowns looming near and far. In the face of these challenges, CIOs, CTOs, and CISOs must figure out how to modernize legacy systems and infrastructure while doing more with less and still defending against external and internal threats ...

Reliability is no longer proven by uptime alone, according to the The SRE Report 2026 from LogicMonitor. In the AI era, it is experienced through speed, consistency, and user trust, and increasingly judged by business impact. As digital services grow more complex and AI systems move into production, traditional monitoring approaches are struggling to keep pace, increasing the need for AI-first observability that spans applications, infrastructure, and the Internet ...

If AI is the engine of a modern organization, then data engineering is the road system beneath it. You can build the most powerful engine in the world, but without paved roads, traffic signals, and bridges that can support its weight, it will stall. In many enterprises, the engine is ready. The roads are not ...

In the world of digital-first business, there is no tolerance for service outages. Businesses know that outages are the quickest way to lose money and customers. For smaller organizations, unplanned downtime could even force the business to close ... A new study from PagerDuty, The State of AI-First Operations, reveals that companies actively incorporating AI into operations now view operational resilience as a growth driver rather than a cost center. But how are they achieving it? ...

In live financial environments, capital markets software cannot pause for rebuilds. New capabilities are introduced as stacked technology layers to meet evolving demands while systems remain active, data keeps moving, and controls stay intact. AI is no exception, and its opportunities are significant: accelerated decision cycles, compressed manual workflows, and more effective operations across complex environments. The constraint isn't the models themselves, but the architectural environments they enter ...

Like most digital transformation shifts, organizations often prioritize productivity and leave security and observability to keep pace. This usually translates to both the mass implementation of new technology and fragmented monitoring and observability (M&O) tooling. In the era of AI and varied cloud architecture, a disparate observability function can be dangerous. IT teams will lack a complete picture of their IT environment, making it harder to diagnose issues while slowing down mean time to resolve (MTTR). In fact, according to recent data from the SolarWinds State of Monitoring & Observability Report, 77% of IT personnel said the lack of visibility across their on-prem and cloud architecture was an issue ...

In MEAN TIME TO INSIGHT Episode 23, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the NetOps labor shortage ...