Discovering AIOps - Part 4: Advantages
October 16, 2023

Pete Goldin
APMdigest

Share this

The reason AIOps has become well-known in recent years is because it offers a variety of very compelling advantages. Part 4 of this blog series covers the expert picks for the greatest advantages that can be gained from AIOps.

Start with: Discovering AIOps - Part 1

Start with: Discovering AIOps - Part 2: Must-Have Capabilities

Start with: Discovering AIOps - Part 3: The Users

Better IT Decision Making

One overarching advantage of AIOps is support for better IT decision-making, with regard to system, network and application performance — an advantage that impacts many of the other advantages on this list.

"A shared goal across IT teams is to gain insights leveraging all available monitoring data," explains Andreas Reiss, Head of Product Management, AIOps and Observability, at Broadcom. "AIOps helps correlate the data and enhances it with meaningful context that informs the enterprise what it doesn't know yet about its environment and possible problem scenarios that might affect the business and end users. These kinds of insights are the goal of AIOps."

AIOps equips organizations with agile insights into their IT ecosystem, according to Payal Kindiger, Senior Director of Product Marketing at Riverbed. By correlating data from various sources, it provides a comprehensive overview of system health, performance trends, and anomalies. This enables quicker and more informed decision-making, supporting agile responses to changing business needs and dynamic IT environments.

Gagan Singh, VP of Product Marketing, Observability, at Elastic, says, "AIOps provides actionable insights by using machine learning algorithms and natural language processing to analyze large volumes of telemetry data (logs, metrics, and traces) from several systems to identify patterns, correlate events, generate predictions about future events, and discover the root cause."

Handling Big Data

Companies today average more than 200 applications, many of which are outside of IT control. As a result, companies are drowning in data, much of which is siloed between departments. AIOps offers the ability to consume large, disparate data sets such as these and help bring order from chaos, says Thomas LaRock, Principal Developer Evangelist at Selector.

"AIOps will monitor one or more big data sources — let's say databases containing software and infrastructure telemetry like metrics, traces, logs, and events — and use machine learning algorithms to find deviations from 'normal' nearly in real time, essentially highlighting these deviations as they are happening or very shortly after," adds Camden Swita, Senior Product Manager at New Relic.

Identifying Anomalies

"AIOps platforms leverage advanced machine learning algorithms to analyze vast volumes of data from various sources. This enables AIOps platforms to identify patterns, anomalies, and correlations within this data. Such patterns and anomalies are usually undetectable by IT admins manually combing through data," says Bharani Kumar Kulasekaran, Product Manager at ManageEngine.

Pinpointing the Root Cause

"Recent data shows that organizations are especially seeing the positive influence of AIOps, with AIOps tools outperforming legacy solutions in a number of ways, such as automatically determining the technical root cause of an issue and better assessing the severity of an issue," says Spiros Xanthos, SVP and General Manager of Observability at Splunk.

Problem isolation is a great example where AIOps can make a big impact by acting to discover a root cause, such as by generating metrics along the graph of relationships between components on a transaction trace, adds Asaf Yigal, CTO of Logz.io.

Predicting Future Issues

AIOps platforms offer forecasting capabilities, which enable them to predict potential future issues based on existing data and trends. This helps organizations take a proactive approach to preventing downtime, Bharani Kumar Kulasekaran, Product Manager at ManageEngine explains.

Accelerating MTTR

Yigal from Logz.io explains, "If you can tell the analyst, this is mega important based on previous investigations in your environment or people have seen this thing a million times and marked it as low-priority, you are having a massive impact. It's really hard to drive down MTTR if you don't have the right contextual information to help with the investigation. However, it's even harder to drive down MTTR if your people are occupying their time focused on the wrong activities."

In the event of an issue, AIOps aids faster Mean Time to Resolution (MTTR) and Mean Time to Detection (MTTD), according to Ali Siddiqui, Chief Product Officer at BMC.

Faster Ticket Remediation

The use of AIOps capabilities in ITSM can speed up or automate ticket remediation, Yigal from Logz.io advises.

Kindiger from Riverbed adds that intelligent ticketing, where unified observability solutions leverage AIOps capabilities, can offer user-first contextualized insights directly within ITSM tickets. This eliminates the need for escalations to specialized resources while empowering L1 service desk agents to swiftly resolve issues with the required information at their fingertips. Additionally, combined with automation, tickets can be created, prioritized, routed and resolved before issues are raised as incidents.

Minimizing Alert Noise

Traditional monitoring tools rely on static thresholds to generate alerts. This coupled with the quest for monitoring everything results in a flood of alerts — false positives, FYI notifications, benign status changes, and duplicates, according to Monika Bhave, Product Manager at Digitate. Sifting through this noise, which can amount to thousands of alerts each day across several different platforms, is a massive undertaking. This can cause enterprises to miss major events, leading to potentially catastrophic outages.

Prior to AIOps, the only tools for dealing with the ineffective and noisy alerting that was clouding IT operations processes were blunt ones: remove non-essential alerting, raise thresholds to prevent false positives, and encode complex logic to handle the alerts, explains Charles Burnham, Director, AIOps Engineering at LogicMonitor. The effect of this sledgehammer approach however, was desensitized and incomplete monitoring that missed outages and hid preventative opportunities and with logic that was often too complex to maintain.

"One of the transformative aspects of AIOps is that, for the first time, an enterprise can cut through the noise that can overwhelm IT operations to identify meaningful signals embedded in monitoring data," says Reiss from Broadcom.

"AIOps can automatically categorize alerts based on their severity and importance, ensuring that only critical alerts are prioritized and addressed in a timely manner," adds Singh from Elastic.

Carlos Casanova, Principal Analyst at Forrester Research, explains further, "The AIOps tools can ingest vast amounts of disparate data and perform multivariate analysis at incredible speeds. AIOps quickly identifies volumes of alerts and if it recognizes that they're all traced back to a common point, it can collapse all of them into a drastically reduced number of items for the operator to deal with. This could easily be hundreds of items reduced to one or two. So, the work queues are significantly less and more important, less time is wasted sorting through the haystack for the needle."

Phillip Carter, Principal Product Manager at Honeycomb, elaborates, "AIOps reduces alert noise in several ways. More broadly speaking, ML models that can tell you what's worth paying attention to in your dumping ground of data will reduce the number of alerts if you were alerting on all of that data. They can also look at historical data on alert resolution and deprioritize alerts that are traditionally false positives."

Swita from New Relic adds, "Alerts age, and some age poorly. If that alert condition you wrote last year or even last month was based on some snapshotted 'norm,' who's to say it's still relevant or worthy of generating an alert today? However, a machine learning algorithm can adjust to the changing norm of a system and adjust alert conditions accordingly (assuming that's what you want), which can often reduce alert frequency and false positives."

"However, it's worth noting that just because an alert was a false positive before doesn't mean it will continue to be a false positive in the future, especially for services with irregular traffic patterns," cautions Carter from Honeycomb.

Go to: Discovering AIOps - Part 5, covering more advantages of AIOps.

Pete Goldin is Editor and Publisher of APMdigest
Share this

The Latest

April 25, 2024

The use of hybrid multicloud models is forecasted to double over the next one to three years as IT decision makers are facing new pressures to modernize IT infrastructures because of drivers like AI, security, and sustainability, according to the Enterprise Cloud Index (ECI) report from Nutanix ...

April 24, 2024

Over the last 20 years Digital Employee Experience has become a necessity for companies committed to digital transformation and improving IT experiences. In fact, by 2025, more than 50% of IT organizations will use digital employee experience to prioritize and measure digital initiative success ...

April 23, 2024

While most companies are now deploying cloud-based technologies, the 2024 Secure Cloud Networking Field Report from Aviatrix found that there is a silent struggle to maximize value from those investments. Many of the challenges organizations have faced over the past several years have evolved, but continue today ...

April 22, 2024

In our latest research, Cisco's The App Attention Index 2023: Beware the Application Generation, 62% of consumers report their expectations for digital experiences are far higher than they were two years ago, and 64% state they are less forgiving of poor digital services than they were just 12 months ago ...

April 19, 2024

In MEAN TIME TO INSIGHT Episode 5, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses the network source of truth ...

April 18, 2024

A vast majority (89%) of organizations have rapidly expanded their technology in the past few years and three quarters (76%) say it's brought with it increased "chaos" that they have to manage, according to Situation Report 2024: Managing Technology Chaos from Software AG ...

April 17, 2024

In 2024 the number one challenge facing IT teams is a lack of skilled workers, and many are turning to automation as an answer, according to IT Trends: 2024 Industry Report ...

April 16, 2024

Organizations are continuing to embrace multicloud environments and cloud-native architectures to enable rapid transformation and deliver secure innovation. However, despite the speed, scale, and agility enabled by these modern cloud ecosystems, organizations are struggling to manage the explosion of data they create, according to The state of observability 2024: Overcoming complexity through AI-driven analytics and automation strategies, a report from Dynatrace ...

April 15, 2024

Organizations recognize the value of observability, but only 10% of them are actually practicing full observability of their applications and infrastructure. This is among the key findings from the recently completed Logz.io 2024 Observability Pulse Survey and Report ...

April 11, 2024

Businesses must adopt a comprehensive Internet Performance Monitoring (IPM) strategy, says Enterprise Management Associates (EMA), a leading IT analyst research firm. This strategy is crucial to bridge the significant observability gap within today's complex IT infrastructures. The recommendation is particularly timely, given that 99% of enterprises are expanding their use of the Internet as a primary connectivity conduit while facing challenges due to the inefficiency of multiple, disjointed monitoring tools, according to Modern Enterprises Must Boost Observability with Internet Performance Monitoring, a new report from EMA and Catchpoint ...