Zebrium Finds Root Cause for Apps Deployed with Kubernetes
May 03, 2021
Share this

Zebrium announced the addition of plain language root cause summarization derived from logs and metrics.

Powered by the GPT-3 language model, this game-changing feature constructs simple-to-understand summaries that help developers and SREs determine incident root cause, regardless of experience level.

Zebrium's ML solution has already successfully found the root cause for over 2,000 incidents in applications deployed with Kubernetes. The new approach speeds up the process of solving application and infrastructure incidents from hours to minutes and has proven to be highly accurate at identifying relevant root cause indicators.

In addition, Zebrium has expanded its integrations with observability and incident management tools to include Atlassian Opsgenie and Jira, complementing existing integrations with Slack, PagerDuty and the Elastic Stack.

These features add important new capabilities for incident response teams at a time when cloud-native applications are evolving faster, becoming increasingly distributed and failing in new ways, making it harder to troubleshoot and resolve incidents. While most DevOps teams today have many tools that can automatically detect software problems, finding root cause is still a manual and slow process of hunting through dashboards and logs to piece together what happened. Instead, Zebrium utilizes unsupervised machine learning to analyze logs and metrics to determine the root cause of application failures. It can also proactively detect new (unknown) failure modes that other tools miss.

Share this

The Latest

March 20, 2023
Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...
March 16, 2023

Managing availability and performance within SAP environments has long been a challenge for IT teams. But as IT environments grow more complex and dynamic, and the speed of innovation in almost every industry continues to accelerate, this situation is becoming a whole lot worse ...

March 15, 2023

Harnessing the power of network-derived intelligence and insights is critical in detecting today's increasingly sophisticated security threats across hybrid and multi-cloud infrastructure, according to a new research study from IDC ...

March 14, 2023

Recent research suggests that many organizations are paying for more software than they need. If organizations are looking to reduce IT spend, leaders should take a closer look at the tools being offered to employees, as not all software is essential ...

March 13, 2023

Organizations are challenged by tool sprawl and data source overload, according to the Grafana Labs Observability Survey 2023, with 52% of respondents reporting that their companies use 6 or more observability tools, including 11% that use 16 or more.

March 09, 2023

An array of tools purport to maintain availability — the trick is sorting through the noise to find the right one. Let us discuss why availability is so important and then unpack the ROI of deploying Artificial Intelligence for IT Operations (AIOps) during an economic downturn ...

March 08, 2023

Development teams so often find themselves rushing to get a release out on time. When it comes time for testing, the software works fine in the lab. But, when it's released, customers report a bunch of bugs. How does this happen? Why weren't the flaws found in QA? ...

March 07, 2023

At the same time, reported network outages globally continue to grow in frequency, duration and fiscal impact. And as migration to the cloud continues at a pace of nearly 5% per year, the amount of control over those cloud-based services typically decreases, which further increases operational risk and the potential for increased costs ... Why are network service disruptions still such an issue? ...

March 06, 2023

Starting with Site Reliability Engineering (SRE) can be intimidating, but the benefits are more than worth it. Let's go over what it is and all the benefits it can bring to your organization ...

March 02, 2023

This year, the survey behind the State of IT Operations Report dug into IT teams' most challenging efficiency hurdles and limitations. The results showcase significant discoveries about how automation is increasing IT agility, reducing costs, and enhancing IT operations teams' endpoint management capabilities in the modern workplace ...