As the planet is becoming smarter, the biggest leaps forward in the next several decades — in business, science and society at large — will come from insights gleaned through perpetual, real-time analysis of data.
While many organizations might think analytics is best suited for marketing and sales departments; analytics of operational data such as network maintenance and IT services is growing in importance. Data, both inside and outside of operations, is growing at an exponential rate in volume and complexity.
Whether your goal is to improve operational efficiencies, or to provide new services, you need to capture, understand and use all of your data – analytics must be core to your thinking. The value is not in any specific analytic algorithms, nor in having everyone become an expert in statistics; it’s in leveraging analytics and analytic applications for specific business goals.
Increasing Complexity
The environment that IT organizations have to deal with is getting increasingly complex with the rapid roll out and adoption of virtualization, cloud and web services. Resource reservation and assignment is now statistical, in both data center and network, as well as across the service delivery infrastructure that might span: physical servers, system virtual machines, process virtual machines, middleware, buses, databases, application servers, and composite applications that might combine multiple internal and external web services. Monitoring such complex infrastructures may generate thousands of metrics for a single service alone.
A performance analyst has the daunting task of trying to understand the relationship between all these things. The task is already difficult enough even at the steady-state node level typically presented in a business service management (BSM) dependency tree. In reality the task is even harder, as metrics have multi-dimensional dynamic relationships that are almost impossible to discover and fully visualize. This makes it nearly impossible to analyze data on a weekly basis, let alone in a near real-time basis necessary to alert or predict emerging fault conditions.
Where Analytics Come Into Play
Every part of the business benefits from understanding operational and strategic data. Before equipment is provisioned, analytics are needed for capacity analysis and planning. Once equipment is obtained and provisioned, analytics are key to efficient management. For example, rather than building maintenance schedules around receiving complaints from customers, availability and performance monitoring provide event and metric information that can be used to identify problems, and understand the root cause, hopefully before customers are affected.
As usage patterns change, analytics allow those changes to be better understood and forecasted for improved operations. One of the most promising areas where analytics can be applied is in extending this analysis to identify early warning indicators to predict when problems will occur, even while all the key performance indicators are still in desirable ranges.
We’ve talked about analytics that’s widely deployed. Now let’s move to predictive analytics.
What Is "Predictive Analytics?"
Predictive analytics is not a crystal ball, it can’t predict whether heavy equipment will come and cut a cable thereby causing outages. Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.
The core of predictive analytics relies on capturing relationships between variables and past occurrences, and exploiting that information to predict future outcomes. Predictive analytics allows companies to move from reactive to proactive management. For example, instead of reacting to trouble tickets and outages, predictive analytics allows operators to prevent those outages from even occurring. By detecting problems such as service impacts, deterioration or outages as they emerge, and before they affect services, IT can address the issue before customers are aware of it or get advanced warning.
There are many different analytics techniques that can be used to provide this kind of early warning system. Dynamic baselining allows you to have thresholds that are based on seasonality and history rather than rely on a single static threshold. Linear and polynomial regression, various kinds of clustering, Granger modeling and other approaches let you build models that can be used to identify anomalies and predict future problems.
In a simple example, analysis may show that two key performance indicators, say application memory use and application requests, vary together. With that model in mind, if analysis detects that memory use is rising and requests are stable, we can infer that there's an anomaly, and an emerging problem, even if the memory use is still within acceptable boundaries. This example would indicate a potential memory leak.
When you can identify problems before they cause outages or service degradations, you have the opportunity to prevent those outages from ever occurring. An IDC study asserts that this kind of analytics has substantially larger ROI than any other kind. (“Predictive Analytics and ROI: Lessons from IDC’s Financial Impact Study” paper, Henry D. Morris)
Gaining New Insight With Predictive Analytics
Proactive management of the infrastructure is pivotal to protecting existing revenues and expanding opportunities. From a customer perspective, “slow is the new broke.”
Taking a look across the infrastructure at whether components are running at or close to capacity can prevent service degradation before it happens and can also identify where extra capacity is required in the components, so it can be ordered in advance of a service affecting event. However, keeping stock of unused equipment – whether it’s connected or not – is just not financially prudent. Predictive analytics can ensure equipment is ordered in time, and not stored on a shelf waiting to be called upon.
This type of analytics ensures customers’ set of expectations are met regarding the quality of services and products. By creating an accurate view of business services, and supporting asset dependencies and status, organizations can predict the changes in service status to maintain optimal performance.
While analytics can build patterns from the historical data, that approach may miss a newly evolving pattern. Analytics can build patterns without any historical data, but this approach takes longer, especially to incorporate seasonality.
An adaptive algorithm that’s been running for a week only, won’t know the yearly cycles. The best of both worlds is to apply predictive analytics that takes advantage of all the data you have – history, subject matter expertise, known topology and the capability of an adaptive algorithm to learn from data in motion.
By analyzing all the data, the system learns what the normal behavior is for the resource’s performance. This includes the seasonality associated with the hourly and daily deviations of use. With this approach, the system learns the normal behavior and adjusts this understanding over time. When the performance starts to deviate from the norm, a notification will be raised. This includes a deviation from the slow utilization of a network to the busiest hour. A financial customer, for example, can now take advantage of the fact that seasonal trends are pretty consistent for trader transactions. When the data deviates from the seasonal expectations, a notification can be raised even before the thresholds are crossed.
Today, we find problems through fault events and by managing thresholds, which have some important implications. First, you have to know what thresholds you need and want to manage. You may need to understand the relationship between key performance indicators; perhaps that voice over IP quality here is related to network traffic over there. As the environment becomes more dynamic and more complex, more thresholds are needed.
Predictive analytics can augment these traditional management techniques, providing new insight from data that’s already being collected, without adding new thresholds. As environments become increasingly dynamic and virtualized, and as operational data continues to grow in size and complexity, analytics can help you make sense of it all.
About Denis Kennelly
Denis Kennelly is VP of Development and CTO for the IBM's Tivoli Network Management Portfolio. He has 20+ years experience in both the Telecommunications and IT industries. During this time, Kennelly helped define and develop both network and service management products in a number of different industry segments. Prior to IBM, Kennelly worked for Vallent Technologies who specialized in Cellular Network Service Assurance software and were acquired by IBM in February 2007.