How Observability Helps Ingest and Normalize Data for DevOps Engineers
September 08, 2021

Richard Whitehead

Share this

Humans naturally love structure. Just take books, for example. We've been ingesting and normalizing data through bookmaking since ancient times. In bookmaking, we transport, or ingest, data (in the form of text and images) from the spoken word or author's imagination to a physical structure. Covers denote the information's beginning and end, and a table of contents and chapters categorize, or normalize, the data.

The same logic applies to modern computer data. Humans prefer information that is easy to understand, and we make sense of unstructured data — whether it's text or time series data — by ingesting and normalizing it.

DevOps, SRE and other operations teams use observability solutions with AIOps to ingest and normalize data to get visibility into tech stacks from a centralized system, reduce noise and understand the data's context for quicker mean time to recovery (MTTR). With AI using these processes to produce actionable insights, teams are free to spend more time innovating and providing superior service assurance.

Let's explore AI's role in ingestion and normalization, and then dive into correlation and deduplication too:

How Is Data Ingested into an Observability Platform?

Solutions that provide observability with AIOps are flexible, incorporating data from a broad range of sources. These monitoring systems ingest event management data, like alerts, log events and time series data. Modern observability solutions also notify teams about system changes, which is critical considering an environmental change instigates most system failures. In the end, any data source is fair game, as long as the data tells you something about your real-time operational environment.

The data source dictates how your monitoring tool ingests the information. The first, more preferred method is a continuous data stream. The alternative is a pull mechanism, like a Prometheus pattern, which scrapes data at regular intervals. In older applications, you may have to use a creative plug-in or adapter that converts information into an accessible format and enables teams to query an application or system for data.

So why move all of this data into an observability platform? Transporting information from multiple sources and putting it into a centralized system can reveal the big picture behind the data.

How Is Data Normalized?

Once data is coming into your observability platform, it's helpful to normalize the information according to its common features. AI can extract information from unstructured data and elevate it to a feature, like a source or timestamp. These features allow you to sort or query the data or, in more sophisticated environments, apply AI-based techniques such as natural language processing (NLP).

As you normalize data, it helps to understand the incoming format and structure. If you're going to map fields and break down the message into component parts, understand what part of the message is variable and what part is static.

You can use enrichment techniques if data doesn't have a required field, appropriate feature or required information. Enrichment skirts the lack of information by finding a key to cross-reference with an external data source.

How Does Observability with AIOps Reduce Toil?

When you have normalized data, you can use AI to detect problems quickly through correlation and deduplication. Imagine if your system fails and you have to dig through hundreds of logs to see how the environment changed. That's time-consuming, not to mention boring.

Correlate, or group, data based on common characteristics like service, class or description field. Time is also handy operational information and serves as a practical classifier. Let's go back to our system failure. If you just made an environmental change, understanding the time the alerts came in helps pinpoint the problem.

Correlation can also mimic human behavior, which is a challenge for most computer systems. For example, online checkout processes are complex, with many integrated, interdependent parts. An intelligent observability tool with AIOps can correlate data alerts related to a checkout process using NLP. If that's an issue, your observability platform will group all of the alerts associated with the stem word "check," which accommodates derivations and variations like "checking," "Check," and "check out."

Let's move on to the benefits of deduplicating normalizing data. You're working and, suddenly, a "CPU overloaded" alert pops up. You start fixing the issue, but another "CPU overloaded" alert hits your inbox. And it's followed by 30 more similar alerts. That's distracting and not particularly useful.

Deduplication reduces noise and minimizes incident volumes by eliminating excessive copies of the data. Instead of the monitoring system telling you that the CPU is overloaded 32 separate times, AI compresses repeated messages into one stateful message. Deduplication can seem trivial, especially compared to techniques like NLP, but the devil is in the details. Understanding when a message indicates a new issue, rather than just a repeated message, must be considered.

Intelligent observability with AIOps centralizes data and makes it easier for teams to understand. And when these systems detect incidents, AI-enabled correlation and deduplication minimize the impact of this unplanned work. The downstream effects on DevOps practitioners and SRE teams are significant. These teams can spend less time putting out fires and more time focusing their time and attention on keeping up with the constant demand to innovate and delight customers.

Richard Whitehead is Chief Evangelist at Moogsoft
Share this

The Latest

June 29, 2022

When it comes to AIOps predictions, there's no question of AI's value in predictive intelligence and faster problem resolution for IT teams. In fact, Gartner has reported that there is no future for IT Operations without AIOps. So, where is AIOps headed in five years? Here's what the vendors and thought leaders in the AIOps space had to share ...

June 27, 2022

A new study by OpsRamp on the state of the Managed Service Providers (MSP) market concludes that MSPs face a market of bountiful opportunities but must prepare for this growth by embracing complex technologies like hybrid cloud management, root cause analysis and automation ...

June 27, 2022

Hybrid work adoption and the accelerated pace of digital transformation are driving an increasing need for automation and site reliability engineering (SRE) practices, according to new research. In a new survey almost half of respondents (48.2%) said automation is a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management ...

June 23, 2022

Digital businesses don't invest in monitoring for monitoring's sake. They do it to make the business run better. Every dollar spent on observability — every hour your team spends using monitoring tools or responding to what they reveal — should tie back directly to business outcomes: conversions, revenues, brand equity. If they don't? You might be missing the forest for the trees ...

June 22, 2022

Every day, companies are missing customer experience (CX) "red flags" because they don't have the tools to observe CX processes or metrics. Even basic errors or defects in automated customer interactions are left undetected for days, weeks or months, leading to widespread customer dissatisfaction. In fact, poor CX and digital technology investments are costing enterprises billions of dollars in lost potential revenue ...

June 21, 2022

Organizations are moving to microservices and cloud native architectures at an increasing pace. The primary incentive for these transformation projects is typically to increase the agility and velocity of software release and product innovation. These dynamic systems, however, are far more complex to manage and monitor, and they generate far higher data volumes ...

June 16, 2022

Global IT teams adapted to remote work in 2021, resolving employee tickets 23% faster than the year before as overall resolution time for IT tickets went down by 7 hours, according to the Freshservice Service Management Benchmark Report from Freshworks ...

June 15, 2022

Once upon a time data lived in the data center. Now data lives everywhere. All this signals the need for a new approach to data management, a next-gen solution ...

June 14, 2022

Findings from the 2022 State of Edge Messaging Report from Ably and Coleman Parkes Research show that most organizations (65%) that have built edge messaging capabilities in house have experienced an outage or significant downtime in the last 12-18 months. Most of the current in-house real-time messaging services aren't cutting it ...

June 13, 2022
Today's users want a complete digital experience when dealing with a software product or system. They are not content with the page load speeds or features alone but want the software to perform optimally in an omnichannel environment comprising multiple platforms, browsers, devices, and networks. This calls into question the role of load testing services to check whether the given software under testing can perform optimally when subjected to peak load ...