Automated Analytics: The Third-Dimension of Application Performance Problem Solving
September 05, 2013
Jason Meserve
Share this

It doesn’t seem all that long ago that one would arrive at the office in the morning, find that the email system or web site was down and call IT to let them know. Sadly, that call would be the first notification IT had to check to see if the reported system was indeed down.

That scenario is the first level of application performance analytics. It isn’t very proactive or smart and can lead to a lot of frustrated users. In 2013, if the first notice of an outage is coming from an employee or worse, a customer, then IT needs to seriously investigate a new solution for alerting to problems. With the competition a click away and razor thin margins, businesses today can’t afford slowdowns and outages, never mind one that requires an end user to report it.

This is why Application Performance Management (APM) systems were developed. To give IT a way of easily seeing problem spots in complex applications and drilling down into the varied layers of the application to find root cause. The majority of today’s APM solutions accomplish this through setting thresholds and baselines (automatically or manually) and alerting when those lines in the sand are approached or crossed. This approach is great for alerting to extreme behavior and lighting up the red, yellow and green lights on an IT operator’s dashboard.

Dashboards are important to Operations. If you’re responsible for a complex system, it helps to watch for extreme measurements on each component. In practice, however, although managing the components for extreme behavior helps, this never proves to be sufficient in keeping the system healthy or in restoring health to the system when it degrades or fails. Components interact with other components. Those interactions can be very important to the overall system, even when no extreme behavior is evident on any one component.

Consider an analogy. If a sick patient seeks care from three different specialists (each responsible for the health of one component of the system) and each specialist prescribes medication without considering the actions of the other specialists, then the interaction of the drugs can cause serious harm to the patient (i.e., the system) even though no single drug is prescribed in excess or would cause any ill effects alone.

In a similar manner, management of IT components in isolation, without consideration of the IT system as a whole and the interactions between all the components, is known to result in poor overall performance, more outages, and slower recovery times.

Let’s focus on an important fact: It’s very expensive to have an outage. “The most recent Enterprise Management Associates (EMA) research finds that for 25% of companies surveyed, an hour of downtime costs the business between $100,000 and $500,000. Another 29% report the cost of downtime to be between $75,000 and $100,000,” according to research published by EMA. And that’s just the bottom line cost. What about customer loyalty and brand reputation? Damage those too badly and the company may never recover.

A Third Wave of Analytics

There’s a new, third wave of smarter, more sophisticated analytics hitting the APM market; these solutions are designed to help shorten the duration of outages and possibly prevent them by giving application operators earlier warnings of problems brewing beneath the surface. A recent APM Digest Q&A with Netuitive’s Nicola Sanna touched on the importance of having machine-driven analytics.

Today’s advanced analytical engines allow the IT practitioner to rise above the level of component management and practice a more efficient and effective form of systems management. Such an engine does not require thresholding, baselining or configuring for any specific application. Instead, the engine consumes raw data and then learns metric, component, and system behavioral patterns on its own. This means the engine learns from observation the difference between normal and abnormal behavior, not at the metric level, not at the component level, but at the systems level.

Sophisticated analytic engines use multivariate anomaly detection to find intervals of time when groups of metrics or application components are interacting with each other in a manner not consistent with the historical patterns. Visualization and analysis of the patterns from such groups of metrics during an abnormal interval reveals where impactful change occurred across multiple components, when change occurred and the scope of the impact across multiple components. This provides a new type of insight not revealed by the other types of APM analysis. In most cases it can either reveal root causes or at least clues about root causes, including relationships the application operator would not have otherwise known.

This achievement of systems management over component management does not work if configuration is required. Neither the operator nor the administrator can be expected to know in advance the interactions which occur in a complex system. They cannot possibly construct rules, thresholds, and dashboards sufficient for capturing relationships they don’t even know about. Nor could they possibly maintain proper configuration over time as change occurs throughout the system. Fortunately, analytics technology has advanced to the point that zero-configuration monitoring and analysis systems are feasible.

Having automated analytics built right into the APM workflow can help application operators discover the source of problems in complex applications more quickly as they do not have to switch between various systems when problems arise. Making cutting-edge analytics part of the everyday APM environment can make IT operators more efficient, helping to reduce the time associated with outages and slowdowns.

This type of analysis harnesses the Big Data created by APM systems and delivers value. As APM monitors collect performance data from thousands of nodes every 15 seconds, the amount of metrics being processed by an APM system quickly adds up. This data is already used for extreme alerting via thresholds which color traffic lights on dashboards, flow maps, and Top-N views. Now it’s possible to augment this component-centric, extreme-behavior-centric approach with machine-driven analytics that enable systems management by mining big data for potential problems, making those millions (or, in some cases, billions) of metrics even more valuable.

With IT staffs spread thin, growing application complexity and increased user demand and expectations, application owners and operators need every insight possible into the performance of critical systems. Add advanced, automated analytics, the must-have next step in delivering that insight, to complement your existing alerts and give your team that critical edge they need to deliver business service reliability.

ABOUT Jason Meserve

Jason Meserve has been working in high-tech for over 15 years, and is currently a Product Marketing Manager at CA Technologies where he focuses on Service Assurance solutions such as Application Performance Management. He built his tech resume in the 10 years he spent as a journalist at Network World, where he created everything from articles, features, blogs, videos and podcasts. Meserve has also held marketing and editorial positions at Constant Contact and Application Development Trends.

Share this

The Latest

June 13, 2019

Establishing a digital business is top-of-mind, even more so than last year, as 91% of organizations have adopted or have plans to adopt a digital-first strategy, according to IDG Communications Digital Business Research ...

June 12, 2019

If digital transformation is to succeed at the pace enterprises demand, IT teams, the CIOs who lead them, and the boardroom must forge a far greater alignment than presently exists. That is the over-arching sentiment expressed by IT professionals in a recent survey on the state of IT infrastructure and roadblocks to digital success ...

June 11, 2019

Given the incredible amount of traffic traversing corporate WANs, it's not surprising that businesses are seeing performance issues. If anything, it's amazing applications work as well as they do ...

June 10, 2019

Are your business applications sluggish? Choppy? Prone to getting hung up or crashing at the most inopportune times? If these symptoms sound familiar, you might be suffering from the heartache of … poor application performance. Stop me if any of this sounds familiar ...

June 06, 2019
AIOps Exchange, a not-for-profit private forum defining the future of AIOps, published <span style="font-style: italic;">The AIOps Manifesto</span> discussing the role of AI in supporting digital transformation ...
June 05, 2019

As network transformation initiatives like SD-WAN, edge computing and public/private clouds are adopted at increasing rates, hybrid networks are quickly becoming the new normal for IT and NetOps professionals.Without visibility into these hybrid network environments, NetOps are unable to troubleshoot the business-critical applications every organization relies on today. Here are four ways IT and NetOps teams can gain better visibility into complex, hybrid networks ...

June 04, 2019

A minimum Internet Performance Bar exists that, if met, should deliver top-tier website performance, regardless of industry, according to the 2019 Digital Experience Performance Benchmark Report, from ThousandEyes, a comparative analysis of web, infrastructure and network performance metrics from the top 20 US digital retail, travel and media websites ...

June 03, 2019

Since digital transformation is happening at such a rapid pace based on new, highly complex technologies like multi-cloud, containers and microservice architectures, customers are experiencing more challenges than ever in managing this complexity. However, with every challenge comes an opportunity. So, how can channel partners leverage these market disruptions to open the door to opportunity? The answer is simple ...

May 30, 2019

Executives from proactive organizations reported using performance management strategies to deliver innovation and meet broader business goals, and implementing application performance management (APM) tools with advanced monitoring features such as real-time user experience monitoring, and providing a composite view of log and performance data, according to Driving Business Performance Through Application Performance Management, a new report from GigaOm ...

May 29, 2019

Through our recent study, we wanted to better understand how service desk users are interacting with the service teams; how they connect for service; the manner in which most service desks receive user requests; and if organizations employ a knowledge base and how that information might be stored. Here’s what we’ve discovered ...