Automation has been around since the first server admin wrote a script. Since then, IT life has continually become more complex — multiple data centers, high availability, disaster recovery, and now the cloud all create a dynamically ever-changing hybrid IT estate.
Over the last few decades, IT departments have decreased budgets in part because of recession. As a result, they have are being asked to do more with less. The increase in work has amplified the need for automation.
IT Process Automation
IT process automation can be used from simple individual scripts to branching scripts to full-blown orchestration systems. In addition to commercial offerings, the open-source community has developed over a dozen projects.
Until recently, all automation required human activation to kick off a process. Operations teams would manually go through the data they received. Once the problem was found, which could take days to fix and war room assemblies, a fix could then be implemented manually, with a script, or an automation tool. As compliance became more important, using tools that could log the actions to implement a fix became more important, but we still have humans working the automation.
Runbooks started as paper instructions on what to do when a well-known problem occurred. More recently, runbooks are automated scripts or orchestration systems. This documentation and automation helps move problem resolution to less experienced operators, sometimes called shift-left.
AI has recently become a reality for IT operations and can find problems then activate the appropriate automation to fix the problem. The original AIOps definition was focused on applying machine learning to the vast amount of data that IT operations was getting from all the monitoring tools it has. According to Enterprise Management Associates (EMA), enterprises have more than 10 monitoring tools managing hundred thousand metrics per day — not including log files.
This amount of data is too much for any human, even a group of humans to process. Hence, the application of machine learning which can process all this information in minutes or hours and point to the most likely root cause or at least narrow to a small number. Succinctly, AIOps turns IT operations data in operational insights to pinpoint the root cause of a problem.
Most recently, the definition of AIOps evolved to include automation. The idea is that once the machine learning determines the problem as described above, it kicks off the automation tools to fix the problem.
A recent survey determined about half the responding organizations allowed for fully automated problem resolution — no human involved. The other half wanted human review before acting, but even this is preferable to having the human take the time to decide which automation flow is required. This is a significant change from five years ago when most organizations were very nervous about automated remediation.
Getting to Automated AIOps
How do you get from where your current IT operations reality to automated AIOps? As the title implies, there are two parts - automation and AI. The below table shows the maturity curve from the automation perspective.
At the end of the day, the goal of the AI part is to take in data and automatically determine the problem. Some problems do not require machine learning to find, but the system must be able to take in data and isolate the problem. Once you have AI identifying a problem, you can connect it to the automated runbook that will remediate it. Then, from the Ops side, start by picking a single-use case, for example, "optimize event management" and then work with your teams to identify problems they see repeatedly. Voila — you have now automated AIOps.
Over 70% of C-Suite decision makers believe business innovation and staff retention are driven by improved visibility into network and application performance, according to Rethink Possible: Visibility and Network Performance – The Pillars of Business Success, a survey
conducted by Riverbed ...
Modern enterprises rely upon their IT departments to deliver a seamless digital customer experience. Performance and availability are the foundational stepping stones to delivering that customer experience. Along those lines, this month we released a new research study titled the IT Downtime Detection and Mitigation Report that contains recommendations on how to best prevent, detect or mitigate brownouts and outages, given the context of today’s IT transformation trends ...
While Application Performance Management (APM) has become mainstream, with a majority of tech pros using APM tools regularly, there's work to be done to move beyond troubleshooting ...
Over the last few decades, IT departments have decreased budgets in part because of recession. As a result, they have are being asked to do more with less. The increase in work has amplified the need for automation ...
Many variables must align for optimum APM, and security is certainly among them. I offer the following APM predictions for 2020, which revolve around the reality that we will definitely begin to see much deeper integration of WAN technology on the security front. Look for this integration to take shape in the following ways ...