I'm a sucker for sci-fi books and movies. Especially those that present far-fetched concepts, only for them to quickly become reality.
Like for example Johnny Cab from the 1990 flick – Total Recall.
In the movie (loosely based on a short-story by master writer Phillip K. Dick), the hero hops into a driverless car called a Johnny Cab. It's fully autonomous, complete with a mannequin-like figure called Johnny that interacts with passengers in an annoying but all too familiar way. Not surprisingly, Johnny ends up smashed to pieces.
At the time driver-less cars seemed fanciful, but now they are coming soon to automotive dealer near you. It's not a question of if but when, and the impact will be incredible. Fully autonomous, they'll be optimized for efficiency, dropping passengers off at their destination and then returning home. They'll be safer too. Today, drivers rely on one set of eyes to drive safely (two if like me you have a back-seat partner driver), but a driver-less car will process hundreds, even thousands of inputs simultaneously from a vast array of sensors.
So, IT operations dudes, if driver-less cars are within reach, where's my driver-less app? That was a question I raised in a recent blog and which was further dissected in an Artificial Intelligence for IT Operations (AIOps) virtual summit keynote presentation. If we're reaching a point where car steering wheels are the new coffee-cup holders, then surely IT monitoring can advance to a nirvana state. A state where AI and machine learning, also known as AIOps, transform reactive and backward-looking monitoring into a fully autonomous function that learns and constantly optimizes applications according to the business outcomes they support.
Of course, we're not yet at a point where steering wheels are an optional extra. There are different levels of automation that must be negotiated to reach a desired state of fully autonomous, or self-governing vehicle. This is perfectly illustrated in published work from the Society of Automotive Engineers (SAE) International. Notably the JS3016 standard, which describes six levels of driving automation, spanning from level 0 - no automation where the driver is in complete control of operational and tactical dynamic driving tasks (steering, braking, accelerating etc + reacting to events) – to level 5 – full automation.
During the presentation mentioned above, Ashok Reddy, CA's Group General Manager – DevOps, used this construct to illustrate how achieving autonomous IT operations using AIOps is analogous to a self-driving car. Interestingly, the levels of automation he describes have many similarities with those of the SAE standard. (see diagram below)
Let's consider the first two – Level 0 and Level 1.
In Level 0 cars, all functions are monitored and performed by the driver. This of course is a situation familiar to IT operations, with over reliance on manual reports (often historical) and static analysis. Like a driver processing multiple events and conditions, IT operations staff will be spending significant amounts of time collecting and processing events from a range of alerting systems designed for discrete technologies (cloud, networks, applications etc).
With Level 1, the car is still controlled by the driver, but some driver assist features may be incorporated into the design, including adaptive cruise control and blind-spot warnings. Similarly, Level 1 IT operations still perform monitoring, but this now includes designs that help staff proactively detect anomalous conditions while algorithmically reducing event noise caused by false alarms.
It's not uncommon for an enterprise to be using upwards of 20 monitoring products
While most of today's cars have a plethora of driver assist wizardry, attaining Level 1 in IT operations can be challenging. As IT assets have accumulated over the years, an operational function organized along technology lines becomes pre-disposed to acquiring tools designed for a discrete monitoring purpose.
As such, it's not uncommon for an enterprise to be using upwards of 20 monitoring products. Each one using some form of static thresholding to trip alarms based on conditions within a narrow domain.
The problem with this approach is that it's no longer scalable. As application architectures become more distributed, composable and ephemeral, analytical methods are needed to process more data and then generate alarms based on abnormal scenarios. Mix this with the application of machine learning across historical data and staff become more adept at identifying and predicting business impacting conditions – or, the (insert expletive here) - we never saw that one coming moments – the blind spots.
While Level 1 sees the beginning of an autonomous operations journey, with analytics and machine learning controlling a limited set of assistance type functions, its value is still extremely beneficial for modern organizations. With algorithmic noise reduction and anomaly detection, staff can progress from being passive or reactive passengers with limited control, to applying the cumulative machine learnings as their improvement feedback loops – they become better drivers – more efficient, less stressed.
Read the next blog in this series, Dude, Where's My Self-Driving App? – AIOps Level 2 – Automated Root-Cause, exploring Level 2 of autonomous operations. A stage where IT operations still has its hands on the wheel, but where anomaly detection is now augmented with service analytics to correlate data across multiple domains and correctly determine problem root-cause. We'll dig deeper, exploring modern methods such as graph and topological analytics, together with a unifying data model, and how they're essential for correlating events across modern tech stacks.