Skip to main content

What Can AIOps Do For IT Ops? - Part 6

APMdigest asked the top minds in the industry what they think AIOps can do for IT Operations. Part 6 is the final installment in the series.

Start with What Can AIOps Do For IT Ops? - Part 1

Start with What Can AIOps Do For IT Ops? - Part 2

Start with What Can AIOps Do For IT Ops? - Part 3

Start with What Can AIOps Do For IT Ops? - Part 4

Start with What Can AIOps Do For IT Ops? - Part 5

SCALABILITY

AIOps advantages can be summed up in one word — scalability. A main advantage of AIOps within DevOps teams is the ability to scale a business with new technology, without having to scale the operations of new services in kind. AIOps allows DevOps teams to focus on innovating and improving the customer experience — the driving force of profitability — not on the constant pressure of monitoring and operating these services. Forward thinking DevOps teams need to be looking at AIOps and machine learning as mission critical to deliver higher availability of services."
Sean McDermott
CEO, Windward Consulting Group

Over the years, there's been a change in the ratio of people managing computers to the number of computers. In the 60s and 70s, there were many operators per machine. With the cloud, one admin manages thousands, possibly hundreds of thousands, of computers. The only way that's been managed has been through improvements in tooling. AIOps is the latest improvement in tooling and enables IT staff to work effectively with huge clusters that dynamically change. No human could possibly watch all the log files looking for anomalies and no simple set of Perl or Python scripts could automate that process. The only way to do this is to use AI to analyze the data being thrown off by huge clusters of computing resources, look for anomalies, and if possible, correct problems without requiring human involvement. For example, AI could detect signatures of failing devices, like disk drives, then move the data from the failing drive to a spare and notify a human to swap in a replacement. An AI system coupled with load balancing hardware could also make predictions about what your traffic will be and allocate resources accordingly. This is especially valuable in the cloud, where admins can allocate and release computing power as needed.
Mike Loukides
VP of Emerging Tech Content, O'Reilly Media

OPTIMIZING VALUE STREAMS

AIOps allows IT Operations to focus more on creating value stream optimization
Muraleedharan Vijayakumar
Senior Technical Manager, GAVS Technologies

The conversation on domain-agnostic versus domain-specific does not really matter. In the past, the domain-agnostic AIOps tools heavily rely on integrations with many different sources to collect data. Domain-centric AIOps tools typically collect most of the required data themselves and sometimes can be more specific to special domains, such as log management or specific application topics such as ERP. What this means: I believe Artificial Intelligence will and should be used across many domains and the current task for IT enterprises is to determine where they want to leverage AI capabilities to gain insights and reduce waste and toil. When analyzing the vendors in this space I found that some vendors tout their AI capabilities specifically for IT operations, others have and are adding additional data analytics and intelligent integrations to support evolving operating models. I think the next normal will require the leverage of AI across the value streams to successfully execute and delivery quality digital services and applications to customers.
Eveline Oehrlich
Chief Research Officer, DevOps Institute

ENABLING SMALLER TEAMS TO BE MORE EFFECTIVE

AIOps enables a small traditional IT Ops team to be much more effective and expand its reach. It can cover a much wider remit, including more systems to deploy, more geographies, and more variants (support AB testing).
Gareth Smith
GM of Eggplant, part of Keysight Technologies

DRIFT TRACKING

Drift tracking from inception to current production state has been a desired state in Enterprise for decades. AIOps can provide operations with a view into the Drift of changes from what was initially deployed to how the environment has changed over time. Understanding Drift is critical to reduce tech debt, incidents and problems across clients to cloud.
Jeanne Morain
Author, Strategist and Transformation Pioneer, iSpeak Cloud

DE-RISK ROLLOUT OF NEW INITIATIVES

AIOps can be the "extra pair of hands" to help identify problems and issues before they happen and from complex and varied data sets that would be difficult for a human to comprehend. This helps de-risk the rollout of new initiatives as issues are quickly identified and, if necessary, remediated or rolled back all quicker than a human can react.
Gareth Smith
GM of Eggplant, part of Keysight Technologies

FOCUS ON MORE STRATEGIC INITIATIVES

AIOps can also classify common issues allowing the Ops team to focus its time and effort on more strategic initiatives for greater efficiencies and benefits.
Gareth Smith
GM of Eggplant, part of Keysight Technologies

LABOR SAVINGS

With AIOps, when you multiply that reduction in fruitless labor cost by the number of applications and infrastructure assets that could generate alerts, multiplied by the amount of times groups in DevOps and IT were handing off issues to each other, a significant labor savings is at stake, as well as a higher rate of employee retention.
Jason English
Principal Analyst, Intellyx

COST EFFICIENCY

AIOps can allow IT organizations to operate efficiently and provide more reliable, scalable infrastructure for their users. With the vast amount of data available today, AIOps allows IT organizations to easily understand things like resource constraints, traffic patterns and automate / scale infrastructure more efficiently. Things that would take a human a lot of time to automate.
Saro Subbiah
VP of Engineering and Technology for Monitor & Platform, Sysdig

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

What Can AIOps Do For IT Ops? - Part 6

APMdigest asked the top minds in the industry what they think AIOps can do for IT Operations. Part 6 is the final installment in the series.

Start with What Can AIOps Do For IT Ops? - Part 1

Start with What Can AIOps Do For IT Ops? - Part 2

Start with What Can AIOps Do For IT Ops? - Part 3

Start with What Can AIOps Do For IT Ops? - Part 4

Start with What Can AIOps Do For IT Ops? - Part 5

SCALABILITY

AIOps advantages can be summed up in one word — scalability. A main advantage of AIOps within DevOps teams is the ability to scale a business with new technology, without having to scale the operations of new services in kind. AIOps allows DevOps teams to focus on innovating and improving the customer experience — the driving force of profitability — not on the constant pressure of monitoring and operating these services. Forward thinking DevOps teams need to be looking at AIOps and machine learning as mission critical to deliver higher availability of services."
Sean McDermott
CEO, Windward Consulting Group

Over the years, there's been a change in the ratio of people managing computers to the number of computers. In the 60s and 70s, there were many operators per machine. With the cloud, one admin manages thousands, possibly hundreds of thousands, of computers. The only way that's been managed has been through improvements in tooling. AIOps is the latest improvement in tooling and enables IT staff to work effectively with huge clusters that dynamically change. No human could possibly watch all the log files looking for anomalies and no simple set of Perl or Python scripts could automate that process. The only way to do this is to use AI to analyze the data being thrown off by huge clusters of computing resources, look for anomalies, and if possible, correct problems without requiring human involvement. For example, AI could detect signatures of failing devices, like disk drives, then move the data from the failing drive to a spare and notify a human to swap in a replacement. An AI system coupled with load balancing hardware could also make predictions about what your traffic will be and allocate resources accordingly. This is especially valuable in the cloud, where admins can allocate and release computing power as needed.
Mike Loukides
VP of Emerging Tech Content, O'Reilly Media

OPTIMIZING VALUE STREAMS

AIOps allows IT Operations to focus more on creating value stream optimization
Muraleedharan Vijayakumar
Senior Technical Manager, GAVS Technologies

The conversation on domain-agnostic versus domain-specific does not really matter. In the past, the domain-agnostic AIOps tools heavily rely on integrations with many different sources to collect data. Domain-centric AIOps tools typically collect most of the required data themselves and sometimes can be more specific to special domains, such as log management or specific application topics such as ERP. What this means: I believe Artificial Intelligence will and should be used across many domains and the current task for IT enterprises is to determine where they want to leverage AI capabilities to gain insights and reduce waste and toil. When analyzing the vendors in this space I found that some vendors tout their AI capabilities specifically for IT operations, others have and are adding additional data analytics and intelligent integrations to support evolving operating models. I think the next normal will require the leverage of AI across the value streams to successfully execute and delivery quality digital services and applications to customers.
Eveline Oehrlich
Chief Research Officer, DevOps Institute

ENABLING SMALLER TEAMS TO BE MORE EFFECTIVE

AIOps enables a small traditional IT Ops team to be much more effective and expand its reach. It can cover a much wider remit, including more systems to deploy, more geographies, and more variants (support AB testing).
Gareth Smith
GM of Eggplant, part of Keysight Technologies

DRIFT TRACKING

Drift tracking from inception to current production state has been a desired state in Enterprise for decades. AIOps can provide operations with a view into the Drift of changes from what was initially deployed to how the environment has changed over time. Understanding Drift is critical to reduce tech debt, incidents and problems across clients to cloud.
Jeanne Morain
Author, Strategist and Transformation Pioneer, iSpeak Cloud

DE-RISK ROLLOUT OF NEW INITIATIVES

AIOps can be the "extra pair of hands" to help identify problems and issues before they happen and from complex and varied data sets that would be difficult for a human to comprehend. This helps de-risk the rollout of new initiatives as issues are quickly identified and, if necessary, remediated or rolled back all quicker than a human can react.
Gareth Smith
GM of Eggplant, part of Keysight Technologies

FOCUS ON MORE STRATEGIC INITIATIVES

AIOps can also classify common issues allowing the Ops team to focus its time and effort on more strategic initiatives for greater efficiencies and benefits.
Gareth Smith
GM of Eggplant, part of Keysight Technologies

LABOR SAVINGS

With AIOps, when you multiply that reduction in fruitless labor cost by the number of applications and infrastructure assets that could generate alerts, multiplied by the amount of times groups in DevOps and IT were handing off issues to each other, a significant labor savings is at stake, as well as a higher rate of employee retention.
Jason English
Principal Analyst, Intellyx

COST EFFICIENCY

AIOps can allow IT organizations to operate efficiently and provide more reliable, scalable infrastructure for their users. With the vast amount of data available today, AIOps allows IT organizations to easily understand things like resource constraints, traffic patterns and automate / scale infrastructure more efficiently. Things that would take a human a lot of time to automate.
Saro Subbiah
VP of Engineering and Technology for Monitor & Platform, Sysdig

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 24, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses network observability tool sprawl ... 

In cloud-native systems, scaling is often as simple as moving a slider. For on-premise databases, the stakes are different. Over-provisioning hardware is expensive. Under-provisioning leads to performance bottlenecks that are difficult to fix once the equipment is in the rack ...

When most people think about cybersecurity, they picture firewalls, encryption, and access controls — technical tools designed to protect systems and data. But beneath the technology lies a deeper set of principles about trust, decision-making, and resilience ... The best leaders don't eliminate risk. They manage it intelligently. And in many ways, cybersecurity offers a surprisingly useful playbook for doing exactly that ...

Many organizations assumed their infrastructure strategy was settled. It had been implemented, optimized and built into long-term plans. Recent changes in technology and vendor consolidation are forcing a second look. Cloud outages and licensing changes have exposed how much dependency exists on a small number of platforms. As a result, organizations are reevaluating whether those decisions still hold up under current conditions ...

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...