APM Battling Bottlenecks with AIOps
February 20, 2019

Christopher Tozzi
Fixate IO

Bottlenecks are one of the banes of IT professionals’ existence, and they have been for decades.

Thanks to a new generation of AIOps-powered tools, however, it’s becoming possible to move past bottlenecks. Let’s explore how.


What is a Bottleneck?

Simply put, a bottleneck is a point of congestion within a hardware infrastructure or software environment which delays operations.

Bottlenecks come in many forms. You could experience a network bottleneck when a router becomes overwhelmed and delays data transfers. You might face an I/O bottleneck that prevents an application from reading and writing data in a timely fashion. Or an application itself could create a bottleneck in the event that poorly written code causes it to stop responding quickly to other applications or services that depend on it.

Whatever their nature, bottlenecks degrade performance. And while they sometimes resolve themselves when the conditions that caused them subside, at other times they require intervention. (Even if they do go away on their own, it’s wise to determine what caused them in the first place and take measures to ensure that they don’t occur again.)

Fighting Bottlenecks: The Old Way

Traditionally, finding and resolving bottlenecks has been a mostly manual affair. While monitoring software might alert you to the fact that part of your environment is experiencing a performance degradation, it was up to you, as an IT admin, to trace that issue back to its source (in other words, to figure out where, exactly, the bottleneck was occurring) and then resolve it.

This process typically required a fair amount of guessing and trial and error. For example, you might know that network traffic is slow, but it would take some investigation to trace the root of the problem back to a specific router that was failing or a firewall that was misconfigured.

Thus, for most IT admins, bottlenecks have been hard to identify, and have consumed a lot of time and effort.

Fighting Bottlenecks with AIOps

With AIOps, however, the manual battle against bottlenecks is becoming a thing of the past.

AIOps uses machine data and analytics to make informed, fast and automated decisions about performance or other problems within IT infrastructure and software environments. That means that human engineers no longer have to puzzle through monitoring data manually in an effort to figure out what is behind performance problems or another issue. Instead, AIOps tools can trace the surface-level manifestation of a problem back to the bottleneck that is its underlying source (in cases where a bottleneck of some type is the root of a problem).

For example, imagine that your monitoring tools generate an alert about an application that has begun responding to requests more slowly than usual. Instead of having to sort out the issue manually, an AIOps-enabled tool could analyze a holistic set of data from the application environment (information such as how many servers are available to host the application, what the free CPU and storage capacity of those servers are, and how many active connections the application is supporting) to identify which infrastructure or software component is causing the poor application problem. For our purposes, we’ll assume that the underlying issue is an I/O bottleneck that is preventing the application from retrieving data at a normal rate from the storage infrastructure on which it depends.

The value of AIOps doesn’t stop with identifying the source of the bottleneck. AIOps can also help to remediate the problem automatically. In the example above, the solution might be to make more storage available to the application in order to improve throughput.

Alternatively, if that is not possible, an AIOps tool might identify other, less critical applications or services that are trying to use the overloaded storage system and de-prioritize their access so that performance of the under-performing application can be restored to normal. The latter strategy may not provide a permanent solution, but it would effectively resolve the issue in the short term and buy human engineers time to implement a permanent fix.

Christopher Tozzi is Senior Editor of Content and DevOps Analyst at Fixate IO
Share this