As the planet is becoming smarter, the biggest leaps forward in the next several decades — in business, science and society at large — will come from insights gleaned through perpetual, real-time analysis of data.
While many organizations might think analytics is best suited for marketing and sales departments; analytics of operational data such as network maintenance and IT services is growing in importance. Data, both inside and outside of operations, is growing at an exponential rate in volume and complexity.
Whether your goal is to improve operational efficiencies, or to provide new services, you need to capture, understand and use all of your data – analytics must be core to your thinking. The value is not in any specific analytic algorithms, nor in having everyone become an expert in statistics; it’s in leveraging analytics and analytic applications for specific business goals.
The environment that IT organizations have to deal with is getting increasingly complex with the rapid roll out and adoption of virtualization, cloud and web services. Resource reservation and assignment is now statistical, in both data center and network, as well as across the service delivery infrastructure that might span: physical servers, system virtual machines, process virtual machines, middleware, buses, databases, application servers, and composite applications that might combine multiple internal and external web services. Monitoring such complex infrastructures may generate thousands of metrics for a single service alone.
A performance analyst has the daunting task of trying to understand the relationship between all these things. The task is already difficult enough even at the steady-state node level typically presented in a business service management (BSM) dependency tree. In reality the task is even harder, as metrics have multi-dimensional dynamic relationships that are almost impossible to discover and fully visualize. This makes it nearly impossible to analyze data on a weekly basis, let alone in a near real-time basis necessary to alert or predict emerging fault conditions.
Where Analytics Come Into Play
Every part of the business benefits from understanding operational and strategic data. Before equipment is provisioned, analytics are needed for capacity analysis and planning. Once equipment is obtained and provisioned, analytics are key to efficient management. For example, rather than building maintenance schedules around receiving complaints from customers, availability and performance monitoring provide event and metric information that can be used to identify problems, and understand the root cause, hopefully before customers are affected.
As usage patterns change, analytics allow those changes to be better understood and forecasted for improved operations. One of the most promising areas where analytics can be applied is in extending this analysis to identify early warning indicators to predict when problems will occur, even while all the key performance indicators are still in desirable ranges.
We’ve talked about analytics that’s widely deployed. Now let’s move to predictive analytics.
What Is "Predictive Analytics?"
Predictive analytics is not a crystal ball, it can’t predict whether heavy equipment will come and cut a cable thereby causing outages. Predictive analytics is an area of statistical analysis that deals with extracting information from data and using it to predict future trends and behavior patterns.
The core of predictive analytics relies on capturing relationships between variables and past occurrences, and exploiting that information to predict future outcomes. Predictive analytics allows companies to move from reactive to proactive management. For example, instead of reacting to trouble tickets and outages, predictive analytics allows operators to prevent those outages from even occurring. By detecting problems such as service impacts, deterioration or outages as they emerge, and before they affect services, IT can address the issue before customers are aware of it or get advanced warning.
There are many different analytics techniques that can be used to provide this kind of early warning system. Dynamic baselining allows you to have thresholds that are based on seasonality and history rather than rely on a single static threshold. Linear and polynomial regression, various kinds of clustering, Granger modeling and other approaches let you build models that can be used to identify anomalies and predict future problems.
In a simple example, analysis may show that two key performance indicators, say application memory use and application requests, vary together. With that model in mind, if analysis detects that memory use is rising and requests are stable, we can infer that there's an anomaly, and an emerging problem, even if the memory use is still within acceptable boundaries. This example would indicate a potential memory leak.
When you can identify problems before they cause outages or service degradations, you have the opportunity to prevent those outages from ever occurring. An IDC study asserts that this kind of analytics has substantially larger ROI than any other kind. (“Predictive Analytics and ROI: Lessons from IDC’s Financial Impact Study” paper, Henry D. Morris)
Gaining New Insight With Predictive Analytics
Proactive management of the infrastructure is pivotal to protecting existing revenues and expanding opportunities. From a customer perspective, “slow is the new broke.”
Taking a look across the infrastructure at whether components are running at or close to capacity can prevent service degradation before it happens and can also identify where extra capacity is required in the components, so it can be ordered in advance of a service affecting event. However, keeping stock of unused equipment – whether it’s connected or not – is just not financially prudent. Predictive analytics can ensure equipment is ordered in time, and not stored on a shelf waiting to be called upon.
This type of analytics ensures customers’ set of expectations are met regarding the quality of services and products. By creating an accurate view of business services, and supporting asset dependencies and status, organizations can predict the changes in service status to maintain optimal performance.
While analytics can build patterns from the historical data, that approach may miss a newly evolving pattern. Analytics can build patterns without any historical data, but this approach takes longer, especially to incorporate seasonality.
An adaptive algorithm that’s been running for a week only, won’t know the yearly cycles. The best of both worlds is to apply predictive analytics that takes advantage of all the data you have – history, subject matter expertise, known topology and the capability of an adaptive algorithm to learn from data in motion.
By analyzing all the data, the system learns what the normal behavior is for the resource’s performance. This includes the seasonality associated with the hourly and daily deviations of use. With this approach, the system learns the normal behavior and adjusts this understanding over time. When the performance starts to deviate from the norm, a notification will be raised. This includes a deviation from the slow utilization of a network to the busiest hour. A financial customer, for example, can now take advantage of the fact that seasonal trends are pretty consistent for trader transactions. When the data deviates from the seasonal expectations, a notification can be raised even before the thresholds are crossed.
Today, we find problems through fault events and by managing thresholds, which have some important implications. First, you have to know what thresholds you need and want to manage. You may need to understand the relationship between key performance indicators; perhaps that voice over IP quality here is related to network traffic over there. As the environment becomes more dynamic and more complex, more thresholds are needed.
Predictive analytics can augment these traditional management techniques, providing new insight from data that’s already being collected, without adding new thresholds. As environments become increasingly dynamic and virtualized, and as operational data continues to grow in size and complexity, analytics can help you make sense of it all.
About Denis Kennelly
Denis Kennelly is VP of Development and CTO for the IBM's Tivoli Network Management Portfolio. He has 20+ years experience in both the Telecommunications and IT industries. During this time, Kennelly helped define and develop both network and service management products in a number of different industry segments. Prior to IBM, Kennelly worked for Vallent Technologies who specialized in Cellular Network Service Assurance software and were acquired by IBM in February 2007.
I've had the opportunity to work with a number of organizations embarking on their AIOps journey. I always advise them to start by evaluating their needs and the possibilities AIOps can bring to them through five different levels of AIOps maturity. This is a strategic approach that allows enterprises to achieve complete automation for long-term success ...
Sumo Logic recently commissioned an independent market research study to understand the industry momentum behind continuous intelligence — and the necessity for digital organizations to embrace a cloud-native, real-time continuous intelligence platform to support the speed and agility of business for faster decision-making, optimizing security, driving new innovation and delivering world-class customer experiences. Some of the key findings include ...
When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many ...
Teams tasked with developing and delivering software are under pressure to balance the business imperative for speed with high customer expectations for quality. In the course of trying to achieve this balance, engineering organizations rely on a variety of tools, techniques and processes. The 2020 State of Software Quality report provides a snapshot of the key challenges organizations encounter when it comes to delivering quality software at speed, as well as how they are approaching these hurdles. This blog introduces its key findings ...
For IT teams, run-the-business, commodity areas such as employee help desks, device support and communication platforms are regularly placed in the crosshairs for cost takeout, but these areas are also highly visible to employees. Organizations can improve employee satisfaction and business performance by building unified functions that are measured by employee experience rather than price. This approach will ultimately fund transformation, as well as increase productivity and innovation ...
In the agile DevOps framework, there is a vital piece missing; something that previous approaches to application development did well, but has since fallen by the wayside. That is, the post-delivery portion of the toolchain. Without continuous cloud optimization, the CI/CD toolchain still produces massive inefficiencies and overspend ...
The COVID-19 pandemic has exponentially accelerated digital transformation projects. To better understand where IT professionals are turning for help, we analyzed the online behaviors of IT decision-makers. Our research found an increase in demand for resources related to APM, microservices and dependence on cloud services ...
The rush to the public cloud has now slowed as organizations realized that it is not a "one size fits all" solution. The main issue is the lack of deep visibility into the performance of applications provided by the host. Our own research has recently revealed that 32% of public cloud resources are currently under-utilized, and without proper direction and guidance, this will remain the case ...
The global shift to working from home (WFH) enforced by COVID-19 stay-at-home orders has had a massive impact on everyone's working lives, not just in the way they remotely interact with their teams and IT systems, but also in how they spend their working days. With both governments and businesses committed to slowly opening up offices, it's increasingly clear that a high prevalence of remote work will continue throughout 2020 and beyond. This situation begets important questions ...