
BigPanda announced a major expansion of its platform capabilities to enable IT Ops, network operations center (NOC), and DevOps teams to rapidly investigate and resolve incidents and outages in cloud-native and hybrid-cloud environments.
Leveraging its Open Box Machine Learning and its Open Integration Hub technologies, BigPanda ingests changes from disparate change feeds and tools, and correlates and analyzes these changes against alerts collected from enterprise monitoring tools to rapidly isolate the root cause change that resulted in an incident or outage.
“Today’s IT environments are very fast-moving and constantly changing. Changes in software and infrastructure occur several times a day at most enterprises, which dramatically increases the potential for unexpected incidents and outages. Unfortunately, legacy IT operations tools weren’t designed for environments of rapid change and are slowing down operations teams from discovering and resolving outages in a timely manner,” said Assaf Resnick, CEO and co-founder, BigPanda. “BigPanda’s new offering puts, for the first time, the root-cause change behind an outage at the IT Ops teams’ fingertips, slashing mean-time-to-resolution and improving the performance of critical systems and applications. This is a win for IT operations teams, their enterprises, and most importantly, their customers.”
As enterprises migrate to the cloud, their IT stacks are accelerating. These fast-moving IT stacks are subject to hundreds or thousands of changes on a constant basis and experience ever-shifting application and service topologies. Legacy IT operations tools and root cause analysis techniques are ineffective inside these fast-moving IT stacks. That’s because legacy tools and techniques were designed for slower-moving monolithic applications and IT stacks, where the root causes of problems were mostly related to infrastructure and hardware failures.
When IT Ops, NOC, and DevOps teams try to use legacy tools and techniques to support cloud-native and hybrid-cloud architectures and applications, incidents and outages become more frequent, last longer and have a wider impact footprint. This creates serious consequences for businesses in the form of higher operating costs, degraded performance and availability, SLA violations and penalties, and ultimately, unhappy customers and end-users.
The BigPanda platform expansion includes the following features designed to speed up incident and outage resolution:
- Root Cause Changes: BigPanda’s platform expansion equips IT Ops, NOC, and DevOps teams, for the first time, with the tools to contend with the thousands of regular application and infrastructure changes that cause incidents and outages.Leveraging out-of-the-box integrations with all major change feeds and tools, BigPanda’s Root Cause Changes feature ingests changes from any source of change data, including change management, change log, configuration management, and others. Subsequently, BigPanda’s Root Cause Changes feature uses machine learning (ML) to correlate and analyze this dataset alongside the dataset of alerts collected from monitoring tools.The ML-driven cross-correlation and analysis surfaces the root cause change that resulted in an incident or outage, enabling IT Ops, NOC and DevOps teams to rapidly handle the change and resolve the incident or outage.
- Real-time Topology Mesh. Another aspect of the BigPanda platform expansion is the launch of the Real-time Topology Mesh. This new capability makes BigPanda’s platform the first AIOps solution to provide a real-time topology model across the entire IT stack, including the dynamic infrastructures inside fast-moving IT stacks, by piecing together the third critical dataset for IT operations: topology data.Leveraging out-of-the-box integrations, BigPanda’s Real-time Topology Mesh ingests topology data from configuration management, cloud & virtualization management, service discovery, APM and CMDB tools to create a full-stack, always up-to-date topology model.For IT Ops, NOC and DevOps teams struggling to detect, investigate and resolve incidents and outages in fast-moving IT environments, BigPanda’s Real-time Topology Mesh significantly improves their ability to detect those incidents and outages, visualize them, identify their probable root cause, understand their impact on users and customers, and route them to the right teams for rapid resolution, all in real-time.
“The world of hybrid IT — with a mix of cloud-native and legacy, on-prem workloads — is here for the foreseeable future. Old approaches to problem solving in these complex, dynamic environments don’t work, in part because they typically don’t deliver insight into the relationship between changes and incidents,” said Nancy Gohring, senior analyst with 451 Research. “Correlating alerts, change events and topology can help teams narrow in on the cause of performance problems in modern application and infrastructure environments.”
With the launch of Root Cause Changes and Real-time Topology Mesh, BigPanda is now able to ingest the three critical datasets in IT operations: alerts, changes and topology, across all layers of fast-moving IT stacks, and use ML to correlate and analyze this data in real-time. This helps IT Ops, NOC and DevOps teams rapidly detect, investigate and resolve incidents and outages, minimizing the impact on users and customers.
Both new additions to the BigPanda platform, Root Cause Changes, and Real-time Topology Mesh, are currently available to select customers as part of a beta program, and will be generally available at the end of the year.
The Latest
I've spent a lot of time in the channel, and one thing I keep coming back to is this: a partner program is only as good as what it looks like in the field. Many programs look great on paper, but when a partner is in front of a customer navigating a complex hybrid environment or trying to make the case for AI-powered observability, the gap between what a vendor promises and what it actually delivers becomes very clear, very fast ...
Enterprises today operate in a real-time environment where uninterrupted access to trusted data has become a baseline expectation for users, applications and automated systems. Traditional DataOps models, built on manual effort and human triage, cannot keep pace with this always active demand. AI agents are emerging as the operational backbone, ensuring consistent data availability, reinforcing trustworthiness and enabling a level of scale that manual processes cannot achieve ...
For decades, trust in the digital workplace rested on familiar signals. We trusted faces on video calls, voices on the phone, and emails that appeared to come from people we knew. These cues felt human and intuitive. They anchored how decisions were made, approvals were granted, and access was authorized. AI-powered deepfakes have quietly broken that model ...
Cloud migration was supposed to be a one-way door. For most enterprises, it turns out it isn't. Cloud data repatriation is a real and growing trend. A new survey ... finds that 89% of organizations plan to expand their on-premises infrastructure footprint over the next two years — and 75% have already moved at least some workloads back from public cloud in the past 24 months. The findings point to a broad rethinking of where data belongs ...
Over the past few years, large language models (LLMs) have revolutionized the software industry. Given their ability to excel at multi-step reasoning, LLMs have helped enterprises streamline workflows and adapt to the unknown. However, employing such models comes with sky-high costs, latency issues, and limited flexibility. In the realm of IT operations, it is generally wiser to employ smaller, domain-specific models instead ...
For years, DevOps teams operated under a simple assumption: collect enough telemetry, and you can find and fix any problem. That assumption is breaking down. Modern enterprises now operate across microservices, hybrid cloud environments, APIs, Kubernetes, and highly automated delivery pipelines. Releases happen continuously, dependencies shift constantly, and failures spread faster than teams can diagnose them ...
New Relic surveyed IT and engineering leaders from the media and entertainment (M&E) sector to understand what's working — and where challenges persist with their observability practices. The findings reveal how M&E organizations are navigating rising platform complexity, audience expectations, and AI-driven change. Below are five takeaways that stand out ...
Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer ... Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem. In reality, you've just identified the starting line ...
In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ...
In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...