Automation has been around since the first server admin wrote a script. Since then, IT life has continually become more complex — multiple data centers, high availability, disaster recovery, and now the cloud all create a dynamically ever-changing hybrid IT estate.
Over the last few decades, IT departments have decreased budgets in part because of recession. As a result, they have are being asked to do more with less. The increase in work has amplified the need for automation.
IT Process Automation
IT process automation can be used from simple individual scripts to branching scripts to full-blown orchestration systems. In addition to commercial offerings, the open-source community has developed over a dozen projects.
Until recently, all automation required human activation to kick off a process. Operations teams would manually go through the data they received. Once the problem was found, which could take days to fix and war room assemblies, a fix could then be implemented manually, with a script, or an automation tool. As compliance became more important, using tools that could log the actions to implement a fix became more important, but we still have humans working the automation.
Runbooks started as paper instructions on what to do when a well-known problem occurred. More recently, runbooks are automated scripts or orchestration systems. This documentation and automation helps move problem resolution to less experienced operators, sometimes called shift-left.
AI has recently become a reality for IT operations and can find problems then activate the appropriate automation to fix the problem. The original AIOps definition was focused on applying machine learning to the vast amount of data that IT operations was getting from all the monitoring tools it has. According to Enterprise Management Associates (EMA), enterprises have more than 10 monitoring tools managing hundred thousand metrics per day — not including log files.
This amount of data is too much for any human, even a group of humans to process. Hence, the application of machine learning which can process all this information in minutes or hours and point to the most likely root cause or at least narrow to a small number. Succinctly, AIOps turns IT operations data in operational insights to pinpoint the root cause of a problem.
Most recently, the definition of AIOps evolved to include automation. The idea is that once the machine learning determines the problem as described above, it kicks off the automation tools to fix the problem.
A recent survey determined about half the responding organizations allowed for fully automated problem resolution — no human involved. The other half wanted human review before acting, but even this is preferable to having the human take the time to decide which automation flow is required. This is a significant change from five years ago when most organizations were very nervous about automated remediation.
Getting to Automated AIOps
How do you get from where your current IT operations reality to automated AIOps? As the title implies, there are two parts - automation and AI. The below table shows the maturity curve from the automation perspective.
At the end of the day, the goal of the AI part is to take in data and automatically determine the problem. Some problems do not require machine learning to find, but the system must be able to take in data and isolate the problem. Once you have AI identifying a problem, you can connect it to the automated runbook that will remediate it. Then, from the Ops side, start by picking a single-use case, for example, "optimize event management" and then work with your teams to identify problems they see repeatedly. Voila — you have now automated AIOps.
To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...
APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...
CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...
Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...
Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...