As the planet is becoming smarter, the biggest leaps forward in the next several decades — in business, science and society at large — will come from insights gleaned through perpetual, real-time analysis of data.
While many organizations might think analytics is best suited for marketing and sales departments; analytics of operational data such as network maintenance and IT services is growing in importance. Data, both inside and outside of operations, is growing at an exponential rate in volume and complexity.
Whether your goal is to improve operational efficiencies, or to provide new services, you need to capture, understand and use all of your data – analytics must be core to your thinking. The value is not in any specific analytic algorithms, nor in having everyone become an expert in statistics; it’s in leveraging analytics and analytic applications for specific business goals.
The environment that IT organizations have to deal with is getting increasingly complex with the rapid roll out and adoption of virtualization, cloud and web services. Resource reservation and assignment is now statistical, in both data center and network, as well as across the service delivery infrastructure that might span: physical servers, system virtual machines, process virtual machines, middleware, buses, databases, application servers, and composite applications that might combine multiple internal and external web services. Monitoring such complex infrastructures may generate thousands of metrics for a single service alone.
A performance analyst has the daunting task of trying to understand the relationship between all these things. The task is already difficult enough even at the steady-state node level typically presented in a business service management (BSM) dependency tree. In reality the task is even harder, as metrics have multi-dimensional dynamic relationships that are almost impossible to discover and fully visualize. This makes it nearly impossible to analyze data on a weekly basis, let alone in a near real-time basis necessary to alert or predict emerging fault conditions.
Where Analytics Come Into Play
Every part of the business benefits from understanding operational and strategic data. Before equipment is provisioned, analytics are needed for capacity analysis and planning. Once equipment is obtained and provisioned, analytics are key to efficient management. For example, rather than building maintenance schedules around receiving complaints from customers, availability and performance monitoring provide event and metric information that can be used to identify problems, and understand the root cause, hopefully before customers are affected.
As usage patterns change, analytics allow those changes to be better understood and forecasted for improved operations. One of the most promising areas where analytics can be applied is in extending this analysis to identify early warning indicators to predict when problems will occur, even while all the key performance indicators are still in desirable ranges.
We’ve talked about analytics that’s widely deployed. Now let’s move to predictive analytics.
What Is "Predictive Analytics?"
Predictive analytics is not a crystal ball, it can’t predict whether heavy equipment will come and cut a cable thereby causing outages. Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.
The core of predictive analytics relies on capturing relationships between variables and past occurrences, and exploiting that information to predict future outcomes. Predictive analytics allows companies to move from reactive to proactive management. For example, instead of reacting to trouble tickets and outages, predictive analytics allows operators to prevent those outages from even occurring. By detecting problems such as service impacts, deterioration or outages as they emerge, and before they affect services, IT can address the issue before customers are aware of it or get advanced warning.
There are many different analytics techniques that can be used to provide this kind of early warning system. Dynamic baselining allows you to have thresholds that are based on seasonality and history rather than rely on a single static threshold. Linear and polynomial regression, various kinds of clustering, Granger modeling and other approaches let you build models that can be used to identify anomalies and predict future problems.
In a simple example, analysis may show that two key performance indicators, say application memory use and application requests, vary together. With that model in mind, if analysis detects that memory use is rising and requests are stable, we can infer that there's an anomaly, and an emerging problem, even if the memory use is still within acceptable boundaries. This example would indicate a potential memory leak.
When you can identify problems before they cause outages or service degradations, you have the opportunity to prevent those outages from ever occurring. An IDC study asserts that this kind of analytics has substantially larger ROI than any other kind. (“Predictive Analytics and ROI: Lessons from IDC’s Financial Impact Study” paper, Henry D. Morris)
Gaining New Insight With Predictive Analytics
Proactive management of the infrastructure is pivotal to protecting existing revenues and expanding opportunities. From a customer perspective, “slow is the new broke.”
Taking a look across the infrastructure at whether components are running at or close to capacity can prevent service degradation before it happens and can also identify where extra capacity is required in the components, so it can be ordered in advance of a service affecting event. However, keeping stock of unused equipment – whether it’s connected or not – is just not financially prudent. Predictive analytics can ensure equipment is ordered in time, and not stored on a shelf waiting to be called upon.
This type of analytics ensures customers’ set of expectations are met regarding the quality of services and products. By creating an accurate view of business services, and supporting asset dependencies and status, organizations can predict the changes in service status to maintain optimal performance.
While analytics can build patterns from the historical data, that approach may miss a newly evolving pattern. Analytics can build patterns without any historical data, but this approach takes longer, especially to incorporate seasonality.
An adaptive algorithm that’s been running for a week only, won’t know the yearly cycles. The best of both worlds is to apply predictive analytics that takes advantage of all the data you have – history, subject matter expertise, known topology and the capability of an adaptive algorithm to learn from data in motion.
By analyzing all the data, the system learns what the normal behavior is for the resource’s performance. This includes the seasonality associated with the hourly and daily deviations of use. With this approach, the system learns the normal behavior and adjusts this understanding over time. When the performance starts to deviate from the norm, a notification will be raised. This includes a deviation from the slow utilization of a network to the busiest hour. A financial customer, for example, can now take advantage of the fact that seasonal trends are pretty consistent for trader transactions. When the data deviates from the seasonal expectations, a notification can be raised even before the thresholds are crossed.
Today, we find problems through fault events and by managing thresholds, which have some important implications. First, you have to know what thresholds you need and want to manage. You may need to understand the relationship between key performance indicators; perhaps that voice over IP quality here is related to network traffic over there. As the environment becomes more dynamic and more complex, more thresholds are needed.
Predictive analytics can augment these traditional management techniques, providing new insight from data that’s already being collected, without adding new thresholds. As environments become increasingly dynamic and virtualized, and as operational data continues to grow in size and complexity, analytics can help you make sense of it all.
About Denis Kennelly
Denis Kennelly is VP of Development and CTO for the IBM's Tivoli Network Management Portfolio. He has 20+ years experience in both the Telecommunications and IT industries. During this time, Kennelly helped define and develop both network and service management products in a number of different industry segments. Prior to IBM, Kennelly worked for Vallent Technologies who specialized in Cellular Network Service Assurance software and were acquired by IBM in February 2007.
Modern enterprises are generating data at an unprecedented rate but aren't taking advantage of all the data available to them in order to drive real-time, actionable insights. According to a recent study commissioned by Actian, more than half of enterprises today are unable to efficiently manage nor effectively use data to drive decision-making ...
According to a study by Forrester Research, an enhanced UX design can increase the conversion rate by 400%. If UX has become the ultimate arbiter in determining the success or failure of a product or service, let us first understand what UX is all about ...
The requirements of an APM tool are now much more complex than they've ever been. Not only do they need to trace a user transaction across numerous microservices on the same system, but they also need to happen pretty fast ...
Performance monitoring is an old problem. As technology has advanced, we've had to evolve how we monitor applications. Initially, performance monitoring largely involved sending ICMP messages to start troubleshooting a down or slow application. Applications have gotten much more complex, so this is no longer enough. Now we need to know not just whether an application is broken, but why it broke. So APM has had to evolve over the years for us to get there. But how did this evolution take place, and what happens next? Let's find out ...
There are some IT organizations that are using DevOps methodology but are wary of getting bogged down in ITSM procedures. But without at least some ITSM controls in place, organizations lose their focus on systematic customer engagement, making it harder for them to scale ...
If you have deployed a Java application in production, you've probably encountered a situation where the application suddenly starts to take up a large amount of CPU. When this happens, application response becomes sluggish and users begin to complain about slow response. Often the solution to this problem is to restart the application and, lo and behold, the problem goes away — only to reappear a few days later. A key question then is: how to troubleshoot high CPU usage of a Java application? ...
Operations are no longer tethered tightly to a main office, as the headquarters-centric model has been retired in favor of a more decentralized enterprise structure. Rather than focus the business around a single location, enterprises are now comprised of a web of remote offices and individuals, where network connectivity has broken down the geographic barriers that in the past limited the availability of talent and resources. Key to the success of the decentralized enterprise model is a new generation of collaboration and communication tools ...
To better understand the AI maturity of businesses, Dotscience conducted a survey of 500 industry professionals. Research findings indicate that although enterprises are dedicating significant time and resources towards their AI deployments, many data science and ML teams don't have the adequate tools needed to properly collaborate on, build and deploy AI models efficiently ...
Digital transformation, migration to the enterprise cloud and increasing customer demands are creating a surge in IT complexity and the associated costs of managing it. Technical leaders around the world are concerned about the effect this has on IT performance and ultimately, their business according to a new report from Dynatrace, based on an independent global survey of 800 CIOs, Top Challenges for CIOs in a Software-Driven, Hybrid, Multi-Cloud World ...