Skip to main content

What Can AIOps Do For IT Ops? - Part 4

APMdigest asked the top minds in the industry what they think AIOps can do for IT Operations. Part 4 covers root cause analysis and automation.

Start with What Can AIOps Do For IT Ops? - Part 1

Start with What Can AIOps Do For IT Ops? - Part 2

Start with What Can AIOps Do For IT Ops? - Part 3

SINGLE PANE OF GLASS

AIOps provides a much needed real-time "single-pane-of-glass" view into complex IT infrastructures that encompass fragmented and distributed multi-vendor, multi-domain technologies including legacy, virtualization, hybrid cloud, containers, microservices, and others. Although AIOps is a seismic change for IT operations, it's not a radical application of analytics and machine learning. The potential of AIOps is enormous. Enterprises that have deployed AIOps solutions are experiencing transformational benefits in revenue growth, better customer retention, improved customer experience, lower costs, and enhanced performance. The time to move is now.
Maruti Sivakumar V
SVP, Head of Digital & Practices, Blue.cloud

ISOLATING THE ROOT CAUSE

AIOps helps build high-quality incidents that include all the necessary technical and business context, alongside AI/ML-identified probable root cause and root cause changes — and present it all within a single pane of glass.
Mohan Kompella, VP Product Marketing,
Adam Blau, Director of Product Marketing,
Anirban Chatterjee, Director of Product Marketing, BigPanda

AIOps is a buzzword 6 different types of products designed to create value for IT Operations professionals. Always pick specific use cases you wish to solve and then understand how machine learning and AI can apply to solve that issue or set of issues. Good examples of this are to help the user isolate the root cause down to a specific component, highlight outliers in graphs and other views, correlate likely related data types together. Generally, these technologies help augment the operator of the software versus being automation magic. Most often these are features in other Observability tools versus AIOps platforms. AIOps platforms are fantasy because the semantic meaning of data is not clear. The result is vendors write rules to analyze the data, making the resulted outcomes only work in specific situations which makes them useless when a major problem happens across a set of complex systems.
Jonah Kowall
CTO, Logz.io

AUTOMATED ROOT CAUSE ANALYSIS

Response automation is one of the most value-driving features of AIOps software tools. IT operators are able to conduct performance tests to establish a baseline for each metric or KPI and define acceptable thresholds for the ones they want to prioritize. When a KPI breach is detected, AIOps software can perform an automated root cause analysis to automatically determine why a problem occurred and implement a solution if one is available.
Abel Gonzalez
Director of Product Marketing, Sumo Logic

Machine learning and AI are not just critical — but foundational — components of a dynamic monitoring platform. Modern applications are constantly in flux, and microservices scale through ephemeral cloud and container infrastructure in response to demand. As these systems become more complex and dynamic, operational tasks consume an increasing share of engineering time. AIOps optimizes and automates IT operations so that engineers can get proactively alerted no matter the size of the workloads, and benefit from an augmented troubleshooting experience by cutting through noise to glean key insights. In some cases, AI can auto-discover the root cause of an issue, saving minutes or hours of stressful investigations. This is the core advantage of effective AIOps — less engineering time wasted on managing complex operations, and more time building new products for customers.
Renaud Boutet
VP of Product, Datadog

BETTER DECISION-MAKING

From a monitoring and observability perspective, a key benefit of AIOps has been the ability to use historical data to increase confidence in decisions that we previously thought were black-and-white. It's relatively simple to have a machine check if a service is up or down, but how do we find the trends that show that whilst the website is up, it's gradually been getting slower over the past few months? Modern tooling allows us to collect enough data and process it fast enough — often in real-time — for the machines to be able to make better-informed decisions, faster. Such decisions could only be made by lengthy human inspection previously. It's a great example of modern tooling working in the background to make sure everything is okay, so we don't have to.
Matt Saunders
Head of DevOps, Adaptavist

AIOps observability can play a critical role in terms of expected trends using the data from users, systems and processes and provide the data back to the decision-makers to make the investment call based on the pattern, trends, etc. With growing Cloud demand, it is imperative the enterprises start investing in AIOps before it is too late.
Vishnu Vasudevan
Head of Product Engineering and Management, Opsera

SYNCING WITH ITSM

Create automated, bi-directional syncing with your ITSM platform, on-call or other collaboration tools and reduce ticket/notification volumes by up to 95%
Mohan Kompella, VP Product Marketing,
Adam Blau, Director of Product Marketing,
Anirban Chatterjee, Director of Product Marketing, BigPanda

First generation AIOps solutions are a step in the right direction, to address the unending IT complexity, but needed more care and feed and only solved limited set of problems for ITOps teams. Looking ahead, new age AIOps platforms are poised to make AIOps faster, better and cheaper — by automating data preparations and integrations, by having native asset/topology intelligence and by using expanded AI/ML frameworks like neural networks, NLP, transformer models and graph databases to address a lot more use cases. This paves a path where everybody in the IT benefits — ITSM, Service Desk, IT Asset/Planning and more.
Tejo Prayaga
Product Management, CloudFabrix

UNDERSTANDING ALGORITHMS

The last several years have seen a dramatic increase in the use of AI across all types of companies and platforms. These complex solutions require more parts of an organization to be knowledgeable of AI, from data pipelines to the workflows that build, qualify and optimize the models. Having a specialized Ops function that understands this end-to-end is going to be critical for maximizing AI's effectiveness in a production environment. Over time, AIOps can build a deeper understanding of the algorithms, then use that knowledge to enhance the infrastructure with automated services around data cleaning, model tuning and scaling that will continue delivering key results for the business. This kind of specialty is beyond what a traditional IT Operations team can do with the breadth that they are normally expected to maintain.
David Luks
VP of Engineering, Smart Applications, Lucidworks

AUTOMATION

AIOps delivers significant value to businesses by automating many of the manual, tedious tasks that distract IT from working on higher level projects, especially when it comes to data prep.
David P. Mariani
CTO and Founder, AtScale

As the cadence of business continues to gain momentum and competition builds, organizations must not only innovate but also identify business problems and inefficiencies and utilize technology to overcome them. AIOps acts as the salve for many enterprise challenges by anchoring a triangulation of machine learning, decision automation and advanced analytics to automate repetitive tasks, freeing IT teams to work on new mission critical and challenging problems — resulting in faster completion of projects and improved business outcomes.
Alan Young
CPO, InRule

REMEDIAL OPTIMIZATION

IT Operations cannot keep up with the requirements of keeping cloud applications functional and running their best. IT Ops needs to utilize the power of AI to keep the many combinations of app parameters and metrics in an optimal state. Moreso, for AIOps to keep operational apps optimized it needs to be continuous (always on) and autonomous (no human intervention). This way AIOps can perform the remedial optimization work the IT Ops SREs would do, but much faster and with more accuracy.
Peter Nickolov
Co-Founder and VP of Engineering, Opsani

Go to What Can AIOps Do For IT Ops? - Part 5

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions.... 

There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

Image
Guardsquare

IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...

In just a few months, Google will again head to Washington DC and meet with the government for a two-week remedy trial to cement the fate of what happens to Chrome and its search business in the face of ongoing antitrust court case(s). Or, Google may proactively decide to make changes, putting the power in its hands to outline a suitable remedy. Regardless of the outcome, one thing is sure: there will be far more implications for AI than just a shift in Google's Search business ... 

Image
Chrome

In today's fast-paced digital world, Application Performance Monitoring (APM) is crucial for maintaining the health of an organization's digital ecosystem. However, the complexities of modern IT environments, including distributed architectures, hybrid clouds, and dynamic workloads, present significant challenges ... This blog explores the challenges of implementing application performance monitoring (APM) and offers strategies for overcoming them ...

Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...

IT infrastructure (on-premises, cloud, or hybrid) is becoming larger and more complex. IT management tools need data to drive better decision making and more process automation to complement manual intervention by IT staff. That is why smart organizations invest in the systems and strategies needed to make their IT infrastructure more resilient in the event of disruption, and why many are turning to application performance monitoring (APM) in conjunction with high availability (HA) clusters ...

In today's data-driven world, the management of databases has become increasingly complex and critical. The following are findings from Redgate's 2025 The State of the Database Landscape report ...

With the 2027 deadline for SAP S/4HANA migrations fast approaching, organizations are accelerating their transition plans ... For organizations that intend to remain on SAP ECC in the near-term, the focus has shifted to improving operational efficiencies and meeting demands for faster cycle times ...

As applications expand and systems intertwine, performance bottlenecks, quality lapses, and disjointed pipelines threaten progress. To stay ahead, leading organizations are turning to three foundational strategies: developer-first observability, API platform adoption, and sustainable test growth ...

What Can AIOps Do For IT Ops? - Part 4

APMdigest asked the top minds in the industry what they think AIOps can do for IT Operations. Part 4 covers root cause analysis and automation.

Start with What Can AIOps Do For IT Ops? - Part 1

Start with What Can AIOps Do For IT Ops? - Part 2

Start with What Can AIOps Do For IT Ops? - Part 3

SINGLE PANE OF GLASS

AIOps provides a much needed real-time "single-pane-of-glass" view into complex IT infrastructures that encompass fragmented and distributed multi-vendor, multi-domain technologies including legacy, virtualization, hybrid cloud, containers, microservices, and others. Although AIOps is a seismic change for IT operations, it's not a radical application of analytics and machine learning. The potential of AIOps is enormous. Enterprises that have deployed AIOps solutions are experiencing transformational benefits in revenue growth, better customer retention, improved customer experience, lower costs, and enhanced performance. The time to move is now.
Maruti Sivakumar V
SVP, Head of Digital & Practices, Blue.cloud

ISOLATING THE ROOT CAUSE

AIOps helps build high-quality incidents that include all the necessary technical and business context, alongside AI/ML-identified probable root cause and root cause changes — and present it all within a single pane of glass.
Mohan Kompella, VP Product Marketing,
Adam Blau, Director of Product Marketing,
Anirban Chatterjee, Director of Product Marketing, BigPanda

AIOps is a buzzword 6 different types of products designed to create value for IT Operations professionals. Always pick specific use cases you wish to solve and then understand how machine learning and AI can apply to solve that issue or set of issues. Good examples of this are to help the user isolate the root cause down to a specific component, highlight outliers in graphs and other views, correlate likely related data types together. Generally, these technologies help augment the operator of the software versus being automation magic. Most often these are features in other Observability tools versus AIOps platforms. AIOps platforms are fantasy because the semantic meaning of data is not clear. The result is vendors write rules to analyze the data, making the resulted outcomes only work in specific situations which makes them useless when a major problem happens across a set of complex systems.
Jonah Kowall
CTO, Logz.io

AUTOMATED ROOT CAUSE ANALYSIS

Response automation is one of the most value-driving features of AIOps software tools. IT operators are able to conduct performance tests to establish a baseline for each metric or KPI and define acceptable thresholds for the ones they want to prioritize. When a KPI breach is detected, AIOps software can perform an automated root cause analysis to automatically determine why a problem occurred and implement a solution if one is available.
Abel Gonzalez
Director of Product Marketing, Sumo Logic

Machine learning and AI are not just critical — but foundational — components of a dynamic monitoring platform. Modern applications are constantly in flux, and microservices scale through ephemeral cloud and container infrastructure in response to demand. As these systems become more complex and dynamic, operational tasks consume an increasing share of engineering time. AIOps optimizes and automates IT operations so that engineers can get proactively alerted no matter the size of the workloads, and benefit from an augmented troubleshooting experience by cutting through noise to glean key insights. In some cases, AI can auto-discover the root cause of an issue, saving minutes or hours of stressful investigations. This is the core advantage of effective AIOps — less engineering time wasted on managing complex operations, and more time building new products for customers.
Renaud Boutet
VP of Product, Datadog

BETTER DECISION-MAKING

From a monitoring and observability perspective, a key benefit of AIOps has been the ability to use historical data to increase confidence in decisions that we previously thought were black-and-white. It's relatively simple to have a machine check if a service is up or down, but how do we find the trends that show that whilst the website is up, it's gradually been getting slower over the past few months? Modern tooling allows us to collect enough data and process it fast enough — often in real-time — for the machines to be able to make better-informed decisions, faster. Such decisions could only be made by lengthy human inspection previously. It's a great example of modern tooling working in the background to make sure everything is okay, so we don't have to.
Matt Saunders
Head of DevOps, Adaptavist

AIOps observability can play a critical role in terms of expected trends using the data from users, systems and processes and provide the data back to the decision-makers to make the investment call based on the pattern, trends, etc. With growing Cloud demand, it is imperative the enterprises start investing in AIOps before it is too late.
Vishnu Vasudevan
Head of Product Engineering and Management, Opsera

SYNCING WITH ITSM

Create automated, bi-directional syncing with your ITSM platform, on-call or other collaboration tools and reduce ticket/notification volumes by up to 95%
Mohan Kompella, VP Product Marketing,
Adam Blau, Director of Product Marketing,
Anirban Chatterjee, Director of Product Marketing, BigPanda

First generation AIOps solutions are a step in the right direction, to address the unending IT complexity, but needed more care and feed and only solved limited set of problems for ITOps teams. Looking ahead, new age AIOps platforms are poised to make AIOps faster, better and cheaper — by automating data preparations and integrations, by having native asset/topology intelligence and by using expanded AI/ML frameworks like neural networks, NLP, transformer models and graph databases to address a lot more use cases. This paves a path where everybody in the IT benefits — ITSM, Service Desk, IT Asset/Planning and more.
Tejo Prayaga
Product Management, CloudFabrix

UNDERSTANDING ALGORITHMS

The last several years have seen a dramatic increase in the use of AI across all types of companies and platforms. These complex solutions require more parts of an organization to be knowledgeable of AI, from data pipelines to the workflows that build, qualify and optimize the models. Having a specialized Ops function that understands this end-to-end is going to be critical for maximizing AI's effectiveness in a production environment. Over time, AIOps can build a deeper understanding of the algorithms, then use that knowledge to enhance the infrastructure with automated services around data cleaning, model tuning and scaling that will continue delivering key results for the business. This kind of specialty is beyond what a traditional IT Operations team can do with the breadth that they are normally expected to maintain.
David Luks
VP of Engineering, Smart Applications, Lucidworks

AUTOMATION

AIOps delivers significant value to businesses by automating many of the manual, tedious tasks that distract IT from working on higher level projects, especially when it comes to data prep.
David P. Mariani
CTO and Founder, AtScale

As the cadence of business continues to gain momentum and competition builds, organizations must not only innovate but also identify business problems and inefficiencies and utilize technology to overcome them. AIOps acts as the salve for many enterprise challenges by anchoring a triangulation of machine learning, decision automation and advanced analytics to automate repetitive tasks, freeing IT teams to work on new mission critical and challenging problems — resulting in faster completion of projects and improved business outcomes.
Alan Young
CPO, InRule

REMEDIAL OPTIMIZATION

IT Operations cannot keep up with the requirements of keeping cloud applications functional and running their best. IT Ops needs to utilize the power of AI to keep the many combinations of app parameters and metrics in an optimal state. Moreso, for AIOps to keep operational apps optimized it needs to be continuous (always on) and autonomous (no human intervention). This way AIOps can perform the remedial optimization work the IT Ops SREs would do, but much faster and with more accuracy.
Peter Nickolov
Co-Founder and VP of Engineering, Opsani

Go to What Can AIOps Do For IT Ops? - Part 5

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions.... 

There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

Image
Guardsquare

IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...

In just a few months, Google will again head to Washington DC and meet with the government for a two-week remedy trial to cement the fate of what happens to Chrome and its search business in the face of ongoing antitrust court case(s). Or, Google may proactively decide to make changes, putting the power in its hands to outline a suitable remedy. Regardless of the outcome, one thing is sure: there will be far more implications for AI than just a shift in Google's Search business ... 

Image
Chrome

In today's fast-paced digital world, Application Performance Monitoring (APM) is crucial for maintaining the health of an organization's digital ecosystem. However, the complexities of modern IT environments, including distributed architectures, hybrid clouds, and dynamic workloads, present significant challenges ... This blog explores the challenges of implementing application performance monitoring (APM) and offers strategies for overcoming them ...

Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...

IT infrastructure (on-premises, cloud, or hybrid) is becoming larger and more complex. IT management tools need data to drive better decision making and more process automation to complement manual intervention by IT staff. That is why smart organizations invest in the systems and strategies needed to make their IT infrastructure more resilient in the event of disruption, and why many are turning to application performance monitoring (APM) in conjunction with high availability (HA) clusters ...

In today's data-driven world, the management of databases has become increasingly complex and critical. The following are findings from Redgate's 2025 The State of the Database Landscape report ...

With the 2027 deadline for SAP S/4HANA migrations fast approaching, organizations are accelerating their transition plans ... For organizations that intend to remain on SAP ECC in the near-term, the focus has shifted to improving operational efficiencies and meeting demands for faster cycle times ...

As applications expand and systems intertwine, performance bottlenecks, quality lapses, and disjointed pipelines threaten progress. To stay ahead, leading organizations are turning to three foundational strategies: developer-first observability, API platform adoption, and sustainable test growth ...