As the planet is becoming smarter, the biggest leaps forward in the next several decades — in business, science and society at large — will come from insights gleaned through perpetual, real-time analysis of data.
While many organizations might think analytics is best suited for marketing and sales departments; analytics of operational data such as network maintenance and IT services is growing in importance. Data, both inside and outside of operations, is growing at an exponential rate in volume and complexity.
Whether your goal is to improve operational efficiencies, or to provide new services, you need to capture, understand and use all of your data – analytics must be core to your thinking. The value is not in any specific analytic algorithms, nor in having everyone become an expert in statistics; it’s in leveraging analytics and analytic applications for specific business goals.
The environment that IT organizations have to deal with is getting increasingly complex with the rapid roll out and adoption of virtualization, cloud and web services. Resource reservation and assignment is now statistical, in both data center and network, as well as across the service delivery infrastructure that might span: physical servers, system virtual machines, process virtual machines, middleware, buses, databases, application servers, and composite applications that might combine multiple internal and external web services. Monitoring such complex infrastructures may generate thousands of metrics for a single service alone.
A performance analyst has the daunting task of trying to understand the relationship between all these things. The task is already difficult enough even at the steady-state node level typically presented in a business service management (BSM) dependency tree. In reality the task is even harder, as metrics have multi-dimensional dynamic relationships that are almost impossible to discover and fully visualize. This makes it nearly impossible to analyze data on a weekly basis, let alone in a near real-time basis necessary to alert or predict emerging fault conditions.
Where Analytics Come Into Play
Every part of the business benefits from understanding operational and strategic data. Before equipment is provisioned, analytics are needed for capacity analysis and planning. Once equipment is obtained and provisioned, analytics are key to efficient management. For example, rather than building maintenance schedules around receiving complaints from customers, availability and performance monitoring provide event and metric information that can be used to identify problems, and understand the root cause, hopefully before customers are affected.
As usage patterns change, analytics allow those changes to be better understood and forecasted for improved operations. One of the most promising areas where analytics can be applied is in extending this analysis to identify early warning indicators to predict when problems will occur, even while all the key performance indicators are still in desirable ranges.
We’ve talked about analytics that’s widely deployed. Now let’s move to predictive analytics.
What Is "Predictive Analytics?"
Predictive analytics is not a crystal ball, it can’t predict whether heavy equipment will come and cut a cable thereby causing outages. Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.
The core of predictive analytics relies on capturing relationships between variables and past occurrences, and exploiting that information to predict future outcomes. Predictive analytics allows companies to move from reactive to proactive management. For example, instead of reacting to trouble tickets and outages, predictive analytics allows operators to prevent those outages from even occurring. By detecting problems such as service impacts, deterioration or outages as they emerge, and before they affect services, IT can address the issue before customers are aware of it or get advanced warning.
There are many different analytics techniques that can be used to provide this kind of early warning system. Dynamic baselining allows you to have thresholds that are based on seasonality and history rather than rely on a single static threshold. Linear and polynomial regression, various kinds of clustering, Granger modeling and other approaches let you build models that can be used to identify anomalies and predict future problems.
In a simple example, analysis may show that two key performance indicators, say application memory use and application requests, vary together. With that model in mind, if analysis detects that memory use is rising and requests are stable, we can infer that there's an anomaly, and an emerging problem, even if the memory use is still within acceptable boundaries. This example would indicate a potential memory leak.
When you can identify problems before they cause outages or service degradations, you have the opportunity to prevent those outages from ever occurring. An IDC study asserts that this kind of analytics has substantially larger ROI than any other kind. (“Predictive Analytics and ROI: Lessons from IDC’s Financial Impact Study” paper, Henry D. Morris)
Gaining New Insight With Predictive Analytics
Proactive management of the infrastructure is pivotal to protecting existing revenues and expanding opportunities. From a customer perspective, “slow is the new broke.”
Taking a look across the infrastructure at whether components are running at or close to capacity can prevent service degradation before it happens and can also identify where extra capacity is required in the components, so it can be ordered in advance of a service affecting event. However, keeping stock of unused equipment – whether it’s connected or not – is just not financially prudent. Predictive analytics can ensure equipment is ordered in time, and not stored on a shelf waiting to be called upon.
This type of analytics ensures customers’ set of expectations are met regarding the quality of services and products. By creating an accurate view of business services, and supporting asset dependencies and status, organizations can predict the changes in service status to maintain optimal performance.
While analytics can build patterns from the historical data, that approach may miss a newly evolving pattern. Analytics can build patterns without any historical data, but this approach takes longer, especially to incorporate seasonality.
An adaptive algorithm that’s been running for a week only, won’t know the yearly cycles. The best of both worlds is to apply predictive analytics that takes advantage of all the data you have – history, subject matter expertise, known topology and the capability of an adaptive algorithm to learn from data in motion.
By analyzing all the data, the system learns what the normal behavior is for the resource’s performance. This includes the seasonality associated with the hourly and daily deviations of use. With this approach, the system learns the normal behavior and adjusts this understanding over time. When the performance starts to deviate from the norm, a notification will be raised. This includes a deviation from the slow utilization of a network to the busiest hour. A financial customer, for example, can now take advantage of the fact that seasonal trends are pretty consistent for trader transactions. When the data deviates from the seasonal expectations, a notification can be raised even before the thresholds are crossed.
Today, we find problems through fault events and by managing thresholds, which have some important implications. First, you have to know what thresholds you need and want to manage. You may need to understand the relationship between key performance indicators; perhaps that voice over IP quality here is related to network traffic over there. As the environment becomes more dynamic and more complex, more thresholds are needed.
Predictive analytics can augment these traditional management techniques, providing new insight from data that’s already being collected, without adding new thresholds. As environments become increasingly dynamic and virtualized, and as operational data continues to grow in size and complexity, analytics can help you make sense of it all.
About Denis Kennelly
Denis Kennelly is VP of Development and CTO for the IBM's Tivoli Network Management Portfolio. He has 20+ years experience in both the Telecommunications and IT industries. During this time, Kennelly helped define and develop both network and service management products in a number of different industry segments. Prior to IBM, Kennelly worked for Vallent Technologies who specialized in Cellular Network Service Assurance software and were acquired by IBM in February 2007.
The well-known "No free lunch" theorem is something you’ve probably heard about if you’re familiar with machine learning in general. This article’s objective is to present the theorem as simply as possible while emphasizing the importance of comprehending its consequences in order to develop an AIOPS strategy ...
IT operations is a metrics-driven function and teams should keep score as a core practice. Services and sub-services break, alerts of varying quality come in, incidents are created, and services get fixed. Analytics can help IT teams improve these operations ...
Big Data makes it possible to bring data from all the monitoring and reporting tools together, both for more effective analysis and a simplified single-pane view for the user. IT teams gain a holistic picture of system performance. Doing this makes sense because the system's components interact, and issues in one area affect another ...
IT engineers and executives are responsible for system reliability and availability. The volume of data can make it hard to be proactive and fix issues quickly. With over a decade of experience in the field, I know the importance of IT operations analytics and how it can help identify incidents and enable agile responses ...
For businesses with vast and distributed computing infrastructures, one of the main objectives of IT and network operations is to locate the cause of a service condition that is having an impact. The more human resources are put into the task of gathering, processing, and finally visual monitoring the massive volumes of event and log data that serve as the main source of symptomatic indications for emerging crises, the closer the service is to the company's source of revenue ...
Our digital economy is intolerant of downtime. But consumers haven't just come to expect always-on digital apps and services. They also expect continuous innovation, new functionality and lightening fast response times. Organizations have taken note, investing heavily in teams and tools that supposedly increase uptime and free resources for innovation. But leaders have not realized this "throw money at the problem" approach to monitoring is burning through resources without much improvement in availability outcomes ...
Although 83% of businesses are concerned about a recession in 2023, B2B tech marketers can look forward to growth — 51% of organizations plan to increase IT budgets in 2023 vs. a narrow 6% that plan to reduce their spend, according to the 2023 State of IT report from Spiceworks Ziff Davis ...
Users have high expectations around applications — quick loading times, look and feel visually advanced, with feature-rich content, video streaming, and multimedia capabilities — all of these devour network bandwidth. With millions of users accessing applications and mobile apps from multiple devices, most companies today generate seemingly unmanageable volumes of data and traffic on their networks ...
In Italy, it is customary to treat wine as part of the meal ... Too often, testing is treated with the same reverence as the post-meal task of loading the dishwasher, when it should be treated like an elegant wine pairing ...
In order to properly sort through all monitoring noise and identify true problems, their causes, and to prioritize them for response by the IT team, they have created and built a revolutionary new system using a meta-cognitive model ...