Skip to main content

Lightstep Announces Root Cause Analysis in Three Clicks

Lightstep announced major updates to its observability solution to help developers optimize root cause analysis and simplify incident response.

With the introduction of log analysis and “Top Changes”, developer teams are able to zero-in on a single line of code to identify the cause of a regression in under a minute.

“Microservices and serverless architectures make it extremely difficult for developers to quickly assess the impact of a regression and isolate the root cause. Whether it’s due to our reliance on tribal knowledge, a lack of context, technology fatigue or red herrings that distract us from looking at the right data, this is a roadblock many developers are all too familiar with. By adding log search and aggregation, and building on our automated intelligence solutions, we’re uniquely positioned to allow any developer working on a deployment to side-step this issue and quickly pinpoint the root cause of a regression in under one minute,” said Katia Bazzi, Senior Software Engineer, Lightstep.

This update builds on Lightstep’s Service Health feature by introducing logs as part of Lightstep’s telemetry data set. As an essential part of the root cause analysis workflow, log search and aggregation help developers pinpoint a regression to a single line of code - allowing them to use the context of traces to paint a full picture of what’s changed.

With this update, Lightstep customers can:

- Identify the most frequently occurring logs in an error or latency regression

- Search across logs to narrow down the root-cause

- Investigate logs along the critical path to understand the root cause of a latency spike

In addition, Lightstep’s automated intelligence algorithms automatically surface which operations have experienced the greatest changes during a specific time period, whether it’s in-real-time, or during a deployment that occurred hours ago. “Top Changes” identifies which error rates, latency, throughput or other service level indicators (SLIs) experience the greatest change, enabling teams to streamline investigations and rapidly resolve incidents.

The Latest

For years, infrastructure teams have treated compute as a relatively stable input. Capacity was provisioned, costs were forecasted, and performance expectations were set based on the assumption that identical resources behaved identically. That mental model is starting to break down. AI infrastructure is no longer behaving like static cloud capacity. It is increasingly behaving like a market ...

Resilience can no longer be defined by how quickly an organization recovers from an incident or disruption. The effectiveness of any resilience strategy is dependent on its ability to anticipate change, operate under continuous stress, and adapt confidently amid uncertainty ...

Mobile users are less tolerant of app instability than ever before. According to a new report from Luciq, No Margin for Error: What Mobile Users Expect and What Mobile Leaders Must Deliver in 2026, even minor performance issues now result in immediate abandonment, lost purchases, and long-term brand impact ...

Artificial intelligence (AI) has become the dominant force shaping enterprise data strategies. Boards expect progress. Executives expect returns. And data leaders are under pressure to prove that their organizations are "AI-ready" ...

Agentic AI is a major buzzword for 2026. Many tech companies are making bold promises about this technology, but many aren't grounded in reality, at least not yet. This coming year will likely be shaped by reality checks for IT teams, and progress will only come from a focus on strong foundations and disciplined execution ...

AI systems are still prone to hallucinations and misjudgments ... To build the trust needed for adoption, AI must be paired with human-in-the-loop (HITL) oversight, or checkpoints where humans verify, guide, and decide what actions are taken. The balance between autonomy and accountability is what will allow AI to deliver on its promise without sacrificing human trust ...

More data center leaders are reducing their reliance on utility grids by investing in onsite power for rapidly scaling data centers, according to the Data Center Power Report from Bloom Energy ...

In MEAN TIME TO INSIGHT Episode 21, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses AI-driven NetOps ... 

Enterprise IT has become increasingly complex and fragmented. Organizations are juggling dozens — sometimes hundreds — of different tools for endpoint management, security, app delivery, and employee experience. Each one needs its own license, its own maintenance, and its own integration. The result is a patchwork of overlapping tools, data stuck in silos, security vulnerabilities, and IT teams are spending more time managing software than actually getting work done ...

2025 was the year everybody finally saw the cracks in the foundation. If you were running production workloads, you probably lived through at least one outage you could not explain to your executives without pulling up a diagram and a whiteboard ...

Lightstep Announces Root Cause Analysis in Three Clicks

Lightstep announced major updates to its observability solution to help developers optimize root cause analysis and simplify incident response.

With the introduction of log analysis and “Top Changes”, developer teams are able to zero-in on a single line of code to identify the cause of a regression in under a minute.

“Microservices and serverless architectures make it extremely difficult for developers to quickly assess the impact of a regression and isolate the root cause. Whether it’s due to our reliance on tribal knowledge, a lack of context, technology fatigue or red herrings that distract us from looking at the right data, this is a roadblock many developers are all too familiar with. By adding log search and aggregation, and building on our automated intelligence solutions, we’re uniquely positioned to allow any developer working on a deployment to side-step this issue and quickly pinpoint the root cause of a regression in under one minute,” said Katia Bazzi, Senior Software Engineer, Lightstep.

This update builds on Lightstep’s Service Health feature by introducing logs as part of Lightstep’s telemetry data set. As an essential part of the root cause analysis workflow, log search and aggregation help developers pinpoint a regression to a single line of code - allowing them to use the context of traces to paint a full picture of what’s changed.

With this update, Lightstep customers can:

- Identify the most frequently occurring logs in an error or latency regression

- Search across logs to narrow down the root-cause

- Investigate logs along the critical path to understand the root cause of a latency spike

In addition, Lightstep’s automated intelligence algorithms automatically surface which operations have experienced the greatest changes during a specific time period, whether it’s in-real-time, or during a deployment that occurred hours ago. “Top Changes” identifies which error rates, latency, throughput or other service level indicators (SLIs) experience the greatest change, enabling teams to streamline investigations and rapidly resolve incidents.

The Latest

For years, infrastructure teams have treated compute as a relatively stable input. Capacity was provisioned, costs were forecasted, and performance expectations were set based on the assumption that identical resources behaved identically. That mental model is starting to break down. AI infrastructure is no longer behaving like static cloud capacity. It is increasingly behaving like a market ...

Resilience can no longer be defined by how quickly an organization recovers from an incident or disruption. The effectiveness of any resilience strategy is dependent on its ability to anticipate change, operate under continuous stress, and adapt confidently amid uncertainty ...

Mobile users are less tolerant of app instability than ever before. According to a new report from Luciq, No Margin for Error: What Mobile Users Expect and What Mobile Leaders Must Deliver in 2026, even minor performance issues now result in immediate abandonment, lost purchases, and long-term brand impact ...

Artificial intelligence (AI) has become the dominant force shaping enterprise data strategies. Boards expect progress. Executives expect returns. And data leaders are under pressure to prove that their organizations are "AI-ready" ...

Agentic AI is a major buzzword for 2026. Many tech companies are making bold promises about this technology, but many aren't grounded in reality, at least not yet. This coming year will likely be shaped by reality checks for IT teams, and progress will only come from a focus on strong foundations and disciplined execution ...

AI systems are still prone to hallucinations and misjudgments ... To build the trust needed for adoption, AI must be paired with human-in-the-loop (HITL) oversight, or checkpoints where humans verify, guide, and decide what actions are taken. The balance between autonomy and accountability is what will allow AI to deliver on its promise without sacrificing human trust ...

More data center leaders are reducing their reliance on utility grids by investing in onsite power for rapidly scaling data centers, according to the Data Center Power Report from Bloom Energy ...

In MEAN TIME TO INSIGHT Episode 21, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses AI-driven NetOps ... 

Enterprise IT has become increasingly complex and fragmented. Organizations are juggling dozens — sometimes hundreds — of different tools for endpoint management, security, app delivery, and employee experience. Each one needs its own license, its own maintenance, and its own integration. The result is a patchwork of overlapping tools, data stuck in silos, security vulnerabilities, and IT teams are spending more time managing software than actually getting work done ...

2025 was the year everybody finally saw the cracks in the foundation. If you were running production workloads, you probably lived through at least one outage you could not explain to your executives without pulling up a diagram and a whiteboard ...