Observability: The Next Frontier for AIOps
September 24, 2020

Will Cappelli
Moogsoft

Share this

Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams.

Listen to Will Cappelli discuss AIOps and Observability on the AI+ITOPS Podcast

It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability.

In concrete terms, this means that for your typical DevOps pros, if the app delivered to their production environment is observable, that's all they need. They're skeptical of what, if anything, AIOps can contribute in this scenario.

This blog will explain why AIOps can help DevOps teams manage their environments with unprecedented accuracy and velocity, and outline the benefits of combining AIOps with observability.


AIOps: Room to Grow its Adoption and Functionality

In truth, there isn't one universally effective set of metrics that works for every team to measure the value that AIOps delivers. This is an issue not just for AIOps but for many ITOM and ITSM technologies as well. In fact, many enterprise IT teams who invested in AIOps in recent years are now carefully watching their deployments to assess their value before deciding whether or not to expand on them.

Still, there's a lot of room for AIOps adoption to grow, because there are many enterprises that haven't adopted it at all. That's why many vendors are trying to position themselves as AIOps players, to be part of a growing market. For this reason, the AIOps market has now gotten crowded.

So how can AIOps as a practice innovate and evolve at this point? What AIOps innovations can deliver unique capabilities that will set it apart from the pack of existing varieties? Clearly, the way to do this is to tailor, expand and apply AI-functionality to observability data. Such a solution would appeal strongly to the DevOps community, and dissolve its historical reluctance and skepticism towards AIOps.

But What is Observability?

However, there's an issue. When you press DevOps pros a little bit and ask them what observability is, you get three very different answers. The first is that observability is nothing more than traditional monitoring applied to a DevOps environment and toolset. This is flat out wrong.

Another meaning you'll hear given to observability is its traditional one: That it's a property of the system being monitored. In other words, observability isn't about the technology doing the monitoring or the observing, but rather it's the self-descriptive data a system generates.

According to this definition, people monitoring these systems can obtain an accurate picture of the changes occurring in them and of their causal relationships. However, it's clear that this view of observability, while related to the second one, is a dead end. It's just a stream of raw data and nothing else.

A third definition is that, compared with traditional monitoring, observability is a fundamentally different way of looking at and getting data from the environment being managed. And it needs to be, because the DevOps world is one of continuous integration, continuous delivery and continuous change — a world that's highly componentized and dynamic.

The way traditional monitoring tools take data from an environment, filter it, and generate events isn't appropriate for DevOps. You need to observe changes that happen so quickly that trying to fit the data into any kind of pre-arranged structure just falls short. You won't be able to see what's going on in the environment.

Instead, DevOps teams need to access the raw data generated by their toolset and environment, and perform analytics directly on it. That raw data is made up of metrics, traces, logs and events. So observability is indeed a revolution, a drastic shift away from all the pre-built filters and the pre-packaged models of traditional monitoring systems.

This definition is the one that serves up a potential for technological innovation and for delivering the most value through AIOps, because DevOps teams do need help to make sense of this raw data stream, and act accordingly.

AI analysis and automation applied to observability can deliver this assistance to DevOps teams. Such an approach would take the raw data from the DevOps environment and give DevOps practitioners an understanding of the systems that they're developing and delivering.

With these insights, DevOps teams can more effectively decide on actions to fix problems, or to improve performance.

So what's involved in combining AIOps and observability?

Metrics, traces, logs and events must first be collected and analyzed. Metrics captures a temporal dimension of what's happening, through its time-series data. Traces map a path through a topology, so they provide a spatial dimension -- a trace is a chain of execution across different system components, usually microservices. Logs and events provide a record of unstructured events.

With AIOps analysis, metrics reveal anomalies, traces show topology-based microservice relationships, and unstructured logs and events provide the foundation for triggering a significant alert.

Machine learning algorithms would then come into play to indicate an uncommon occurrence, pinpoint unusual metrics, traces, logs and events, and correlate them using temporal, spatial and textual criteria. The next step in the process would be the identification of a probable root cause of the problem, based on the history of previously resolved incidents. Then, ideally, automated remedial actions would be carried out.

Clearly, this combination of AIOps and observability would offer tremendous value to DevOps teams, as it would automate the detection, diagnosis and remediation of problems with the speed and accuracy required in their CI/CD environments. This would represent a breakthrough for AIOps: Earning the appreciation of reticent DevOps teams by giving them deep insights into observability data, and unparalleled visibility into their environments.

Will Cappelli is Field CTO at Moogsoft
Share this

The Latest

September 16, 2021

Achieve more with less. How many of you feel that pressure — or, even worse, hear those words — trickle down from leadership? The reality is that overworked and under-resourced IT departments will only lead to chronic errors, missed deadlines and service assurance failures. After all, we're only human. So what are overburdened IT departments to do? Reduce the human factor. In a word: automate ...

September 15, 2021

On average, data innovators release twice as many products and increase employee productivity at double the rate of organizations with less mature data strategies, according to the State of Data Innovation report from Splunk ...

September 14, 2021

While 90% of respondents believe observability is important and strategic to their business — and 94% believe it to be strategic to their role — just 26% noted mature observability practices within their business, according to the 2021 Observability Forecast ...

September 13, 2021

Let's explore a few of the most prominent app success indicators and how app engineers can shift their development strategy to better meet the needs of today's app users ...

September 09, 2021

Business enterprises aiming at digital transformation or IT companies developing new software applications face challenges in developing eye-catching, robust, fast-loading, mobile-friendly, content-rich, and user-friendly software. However, with increased pressure to reduce costs and save time, business enterprises often give a short shrift to performance testing services ...

September 08, 2021

DevOps, SRE and other operations teams use observability solutions with AIOps to ingest and normalize data to get visibility into tech stacks from a centralized system, reduce noise and understand the data's context for quicker mean time to recovery (MTTR). With AI using these processes to produce actionable insights, teams are free to spend more time innovating and providing superior service assurance. Let's explore AI's role in ingestion and normalization, and then dive into correlation and deduplication too ...

September 07, 2021

As we look into the future direction of observability, we are paying attention to the rise of artificial intelligence, machine learning, security, and more. I asked top industry experts — DevOps Institute Ambassadors — to offer their predictions for the future of observability. The following are 10 predictions ...

September 01, 2021

One thing is certain: The hybrid workplace, a term we helped define in early 2020, with its human-centric work design, is the future. However, this new hybrid work flexibility does not come without its costs. According to Microsoft ... weekly meeting times for MS Teams users increased 148%, between February 2020 and February 2021 they saw a 40 billion increase in the number of emails, weekly per person team chats is up 45% (and climbing), and people working on Office Docs increased by 66%. This speaks to the need to further optimize remote interactions to avoid burnout ...

August 31, 2021

Here's how it happens: You're deploying a new technology, thinking everything's going smoothly, when the alerts start coming in. Your rollout has hit a snag. Whole groups of users are complaining about poor performance on their devices. Some can't access applications at all. You've now blown your service-level agreement (SLA). You might have just introduced a new security vulnerability. In the worst case, your big expensive product launch has missed the mark altogether. "How did this happen?" you're asking yourself. "Didn't we test everything before we deployed?" ...

August 30, 2021

The Fastly outage in June 2021 showed how one inconspicuous coding error can cause worldwide chaos. A single Fastly customer making a legitimate configuration change, triggered a hidden bug that sent half of the internet offline, including web giants like Amazon and Reddit. Ultimately, this incident illustrates why organizations must test their software in production ...