Automated Analytics: The Third-Dimension of Application Performance Problem Solving
September 05, 2013
Jason Meserve
Share this

It doesn’t seem all that long ago that one would arrive at the office in the morning, find that the email system or web site was down and call IT to let them know. Sadly, that call would be the first notification IT had to check to see if the reported system was indeed down.

That scenario is the first level of application performance analytics. It isn’t very proactive or smart and can lead to a lot of frustrated users. In 2013, if the first notice of an outage is coming from an employee or worse, a customer, then IT needs to seriously investigate a new solution for alerting to problems. With the competition a click away and razor thin margins, businesses today can’t afford slowdowns and outages, never mind one that requires an end user to report it.

This is why Application Performance Management (APM) systems were developed. To give IT a way of easily seeing problem spots in complex applications and drilling down into the varied layers of the application to find root cause. The majority of today’s APM solutions accomplish this through setting thresholds and baselines (automatically or manually) and alerting when those lines in the sand are approached or crossed. This approach is great for alerting to extreme behavior and lighting up the red, yellow and green lights on an IT operator’s dashboard.

Dashboards are important to Operations. If you’re responsible for a complex system, it helps to watch for extreme measurements on each component. In practice, however, although managing the components for extreme behavior helps, this never proves to be sufficient in keeping the system healthy or in restoring health to the system when it degrades or fails. Components interact with other components. Those interactions can be very important to the overall system, even when no extreme behavior is evident on any one component.

Consider an analogy. If a sick patient seeks care from three different specialists (each responsible for the health of one component of the system) and each specialist prescribes medication without considering the actions of the other specialists, then the interaction of the drugs can cause serious harm to the patient (i.e., the system) even though no single drug is prescribed in excess or would cause any ill effects alone.

In a similar manner, management of IT components in isolation, without consideration of the IT system as a whole and the interactions between all the components, is known to result in poor overall performance, more outages, and slower recovery times.

Let’s focus on an important fact: It’s very expensive to have an outage. “The most recent Enterprise Management Associates (EMA) research finds that for 25% of companies surveyed, an hour of downtime costs the business between $100,000 and $500,000. Another 29% report the cost of downtime to be between $75,000 and $100,000,” according to research published by EMA. And that’s just the bottom line cost. What about customer loyalty and brand reputation? Damage those too badly and the company may never recover.

A Third Wave of Analytics

There’s a new, third wave of smarter, more sophisticated analytics hitting the APM market; these solutions are designed to help shorten the duration of outages and possibly prevent them by giving application operators earlier warnings of problems brewing beneath the surface. A recent APM Digest Q&A with Netuitive’s Nicola Sanna touched on the importance of having machine-driven analytics.

Today’s advanced analytical engines allow the IT practitioner to rise above the level of component management and practice a more efficient and effective form of systems management. Such an engine does not require thresholding, baselining or configuring for any specific application. Instead, the engine consumes raw data and then learns metric, component, and system behavioral patterns on its own. This means the engine learns from observation the difference between normal and abnormal behavior, not at the metric level, not at the component level, but at the systems level.

Sophisticated analytic engines use multivariate anomaly detection to find intervals of time when groups of metrics or application components are interacting with each other in a manner not consistent with the historical patterns. Visualization and analysis of the patterns from such groups of metrics during an abnormal interval reveals where impactful change occurred across multiple components, when change occurred and the scope of the impact across multiple components. This provides a new type of insight not revealed by the other types of APM analysis. In most cases it can either reveal root causes or at least clues about root causes, including relationships the application operator would not have otherwise known.

This achievement of systems management over component management does not work if configuration is required. Neither the operator nor the administrator can be expected to know in advance the interactions which occur in a complex system. They cannot possibly construct rules, thresholds, and dashboards sufficient for capturing relationships they don’t even know about. Nor could they possibly maintain proper configuration over time as change occurs throughout the system. Fortunately, analytics technology has advanced to the point that zero-configuration monitoring and analysis systems are feasible.

Having automated analytics built right into the APM workflow can help application operators discover the source of problems in complex applications more quickly as they do not have to switch between various systems when problems arise. Making cutting-edge analytics part of the everyday APM environment can make IT operators more efficient, helping to reduce the time associated with outages and slowdowns.

This type of analysis harnesses the Big Data created by APM systems and delivers value. As APM monitors collect performance data from thousands of nodes every 15 seconds, the amount of metrics being processed by an APM system quickly adds up. This data is already used for extreme alerting via thresholds which color traffic lights on dashboards, flow maps, and Top-N views. Now it’s possible to augment this component-centric, extreme-behavior-centric approach with machine-driven analytics that enable systems management by mining big data for potential problems, making those millions (or, in some cases, billions) of metrics even more valuable.

With IT staffs spread thin, growing application complexity and increased user demand and expectations, application owners and operators need every insight possible into the performance of critical systems. Add advanced, automated analytics, the must-have next step in delivering that insight, to complement your existing alerts and give your team that critical edge they need to deliver business service reliability.

ABOUT Jason Meserve

Jason Meserve has been working in high-tech for over 15 years, and is currently a Product Marketing Manager at CA Technologies where he focuses on Service Assurance solutions such as Application Performance Management. He built his tech resume in the 10 years he spent as a journalist at Network World, where he created everything from articles, features, blogs, videos and podcasts. Meserve has also held marketing and editorial positions at Constant Contact and Application Development Trends.

Related Links:

www.ca.com/apm

Q&A Part One: Netuitive's Nicola Sanna Talks About Aligning IT with the Business

www.google.com
Enterprise Management Associates Report: The Top-line and the Bottom-line Impact of Application Performance Challenges

Share this

The Latest

January 16, 2018

Looking back on this year, we can see threads of what the future holds in enterprise networking. Specifically, taking a closer look at the biggest news and trends of this year, IT areas where businesses are investing and perspectives from the analyst community, as well as our own experiences, here are five network predictions for the coming year ...

January 12, 2018

As we enter 2018, businesses are busy anticipating what the new year will bring in terms of industry developments, growing trends, and hidden surprises. In 2017, the increased use of automation within testing teams (where Agile development boosted speed of release), led to QA becoming much more embedded within development teams than would have been the case a few years ago. As a result, proper software testing and monitoring assumes ever greater importance. The natural question is – what next? Here are some of the changes we believe will happen within our industry in 2018 ...

January 11, 2018

Application Performance Monitoring (APM) has become a must-have technology for IT organizations. In today’s era of digital transformation, distributed computing and cloud-native services, APM tools enable IT organizations to measure the real experience of users, trace business transactions to identify slowdowns and deliver the code-level visibility needed for optimizing the performance of applications. 2018 will see the requirements and expectations from APM solutions increase in the following ways ...

January 10, 2018

We don't often enough look back at the prior year’s predictions to see if they actually came to fruition. That is the purpose of this analysis. I have picked out a few key areas in APMdigest's 2017 Application Performance Management Predictions, and analyzed which predictions actually came true ...

January 09, 2018

Planning for a new year often includes predicting what’s going to happen. However, we don't often enough look back at the prior year’s predictions to see if they actually came to fruition. That is the purpose of this analysis. I have picked out a few key areas in APMdigest's 2017 Application Performance Management Predictions, and analyzed which predictions actually came true ...

January 08, 2018

The annual list of DevOps Predictions is now a DEVOPSdigest tradition. DevOps experts — analysts and consultants, users and the top vendors — offer predictions on how DevOps and related technologies will evolve and impact business in 2018 ...

January 05, 2018

Industry experts offer predictions on how Network Performance Management (NPM) and related technologies will evolve and impact business in 2018 ...

January 04, 2018

Industry experts offer predictions on how APM and related technologies will evolve and impact business in 2018. Part 6 covers ITOA and data ...

January 03, 2018

Industry experts offer predictions on how APM and related technologies will evolve and impact business in 2018. Part 5 covers NoOps, Analytics, Machine Learning and AI ...

December 21, 2017

Industry experts offer predictions on how APM and related technologies will evolve and impact business in 2018. Part 4 covers the end user experience ...