Automated Analytics: The Third-Dimension of Application Performance Problem Solving
September 05, 2013
Jason Meserve
Share this

It doesn’t seem all that long ago that one would arrive at the office in the morning, find that the email system or web site was down and call IT to let them know. Sadly, that call would be the first notification IT had to check to see if the reported system was indeed down.

That scenario is the first level of application performance analytics. It isn’t very proactive or smart and can lead to a lot of frustrated users. In 2013, if the first notice of an outage is coming from an employee or worse, a customer, then IT needs to seriously investigate a new solution for alerting to problems. With the competition a click away and razor thin margins, businesses today can’t afford slowdowns and outages, never mind one that requires an end user to report it.

This is why Application Performance Management (APM) systems were developed. To give IT a way of easily seeing problem spots in complex applications and drilling down into the varied layers of the application to find root cause. The majority of today’s APM solutions accomplish this through setting thresholds and baselines (automatically or manually) and alerting when those lines in the sand are approached or crossed. This approach is great for alerting to extreme behavior and lighting up the red, yellow and green lights on an IT operator’s dashboard.

Dashboards are important to Operations. If you’re responsible for a complex system, it helps to watch for extreme measurements on each component. In practice, however, although managing the components for extreme behavior helps, this never proves to be sufficient in keeping the system healthy or in restoring health to the system when it degrades or fails. Components interact with other components. Those interactions can be very important to the overall system, even when no extreme behavior is evident on any one component.

Consider an analogy. If a sick patient seeks care from three different specialists (each responsible for the health of one component of the system) and each specialist prescribes medication without considering the actions of the other specialists, then the interaction of the drugs can cause serious harm to the patient (i.e., the system) even though no single drug is prescribed in excess or would cause any ill effects alone.

In a similar manner, management of IT components in isolation, without consideration of the IT system as a whole and the interactions between all the components, is known to result in poor overall performance, more outages, and slower recovery times.

Let’s focus on an important fact: It’s very expensive to have an outage. “The most recent Enterprise Management Associates (EMA) research finds that for 25% of companies surveyed, an hour of downtime costs the business between $100,000 and $500,000. Another 29% report the cost of downtime to be between $75,000 and $100,000,” according to research published by EMA. And that’s just the bottom line cost. What about customer loyalty and brand reputation? Damage those too badly and the company may never recover.

A Third Wave of Analytics

There’s a new, third wave of smarter, more sophisticated analytics hitting the APM market; these solutions are designed to help shorten the duration of outages and possibly prevent them by giving application operators earlier warnings of problems brewing beneath the surface. A recent APM Digest Q&A with Netuitive’s Nicola Sanna touched on the importance of having machine-driven analytics.

Today’s advanced analytical engines allow the IT practitioner to rise above the level of component management and practice a more efficient and effective form of systems management. Such an engine does not require thresholding, baselining or configuring for any specific application. Instead, the engine consumes raw data and then learns metric, component, and system behavioral patterns on its own. This means the engine learns from observation the difference between normal and abnormal behavior, not at the metric level, not at the component level, but at the systems level.

Sophisticated analytic engines use multivariate anomaly detection to find intervals of time when groups of metrics or application components are interacting with each other in a manner not consistent with the historical patterns. Visualization and analysis of the patterns from such groups of metrics during an abnormal interval reveals where impactful change occurred across multiple components, when change occurred and the scope of the impact across multiple components. This provides a new type of insight not revealed by the other types of APM analysis. In most cases it can either reveal root causes or at least clues about root causes, including relationships the application operator would not have otherwise known.

This achievement of systems management over component management does not work if configuration is required. Neither the operator nor the administrator can be expected to know in advance the interactions which occur in a complex system. They cannot possibly construct rules, thresholds, and dashboards sufficient for capturing relationships they don’t even know about. Nor could they possibly maintain proper configuration over time as change occurs throughout the system. Fortunately, analytics technology has advanced to the point that zero-configuration monitoring and analysis systems are feasible.

Having automated analytics built right into the APM workflow can help application operators discover the source of problems in complex applications more quickly as they do not have to switch between various systems when problems arise. Making cutting-edge analytics part of the everyday APM environment can make IT operators more efficient, helping to reduce the time associated with outages and slowdowns.

This type of analysis harnesses the Big Data created by APM systems and delivers value. As APM monitors collect performance data from thousands of nodes every 15 seconds, the amount of metrics being processed by an APM system quickly adds up. This data is already used for extreme alerting via thresholds which color traffic lights on dashboards, flow maps, and Top-N views. Now it’s possible to augment this component-centric, extreme-behavior-centric approach with machine-driven analytics that enable systems management by mining big data for potential problems, making those millions (or, in some cases, billions) of metrics even more valuable.

With IT staffs spread thin, growing application complexity and increased user demand and expectations, application owners and operators need every insight possible into the performance of critical systems. Add advanced, automated analytics, the must-have next step in delivering that insight, to complement your existing alerts and give your team that critical edge they need to deliver business service reliability.

ABOUT Jason Meserve

Jason Meserve has been working in high-tech for over 15 years, and is currently a Product Marketing Manager at CA Technologies where he focuses on Service Assurance solutions such as Application Performance Management. He built his tech resume in the 10 years he spent as a journalist at Network World, where he created everything from articles, features, blogs, videos and podcasts. Meserve has also held marketing and editorial positions at Constant Contact and Application Development Trends.

Share this

The Latest

December 07, 2023

Part 4 covers OpenTelemetry: Next year, we're going to see more embrace of OpenTelemetry across the entire industry — opening up the future of instrumentation ...

December 06, 2023

Part 3 covers even more on Observability: Observability will move up the organization to support the sustainability and FinOps drive. The combined pressure of needing to adopt more sustainable practices and tackle rising cloud costs will catapult observability from an IT priority to a business requirement in 2024 ...

December 05, 2023

Part 2 covers more on Observability: In 2024, observability platforms will embrace and innovate with new technologies like GenAI for real-time analytics, becoming the fulcrum for digital experience management ...

December 04, 2023

The Holiday Season means it is time for APMdigest's annual list of Application Performance Management (APM) predictions, covering IT performance topics. Industry experts — from analysts and consultants to the top vendors — offer thoughtful, insightful, and often controversial predictions on how APM, Observability, AIOps and related technologies will evolve and impact business in 2024. Part 1 covers APM and Observability ...

November 30, 2023

To help you stay on top of the ever-evolving tech scene, Automox IT experts shake the proverbial magic eight ball and share their predictions about tech trends in the coming year. From M&A frenzies to sustainable tech and automation, these forecasts paint an exciting picture of the future ...

November 29, 2023
The past few years have presented numerous challenges for businesses: a pandemic, rising interest rates, supply chain disruptions, and geopolitical conflict that sent shockwaves across the global economy. But change may finally be on the horizon. According to a recent report by Endava ... a majority of executives confirmed they are feeling optimistic about the current business climate, and as a result, are forecasting larger IT budgets, increased technology funding and rollout, and prioritized innovation in the coming year ...
November 28, 2023

Incident management processes are not keeping pace with the demands of modern operations teams, failing to meet the needs of SREs as well as platform and ops teams. Results from the State of DevOps Automation and AI Survey, commissioned by Transposit, point to an incident management paradox. Despite nearly 60% of ITOps and DevOps professionals reporting they have a defined incident management process that's fully documented in one place and over 70% saying they have a level of automation that meets their needs, teams are unable to quickly resolve incidents ...

November 27, 2023

Today, in the world of enterprise technology, the challenges posed by legacy Virtual Desktop Infrastructure (VDI) systems have long been a source of concern for IT departments. In many instances, this promising solution has become an organizational burden, hindering progress, depleting resources, and taking a psychological and operational toll on employees ...

November 22, 2023

Within retail organizations across the world, IT teams will be bracing themselves for a hectic holiday season ... While this is an exciting opportunity for retailers to boost sales, it also intensifies severe risk. Any application performance slipup will cause consumers to turn their back on brands, possibly forever. Online shoppers will be completely unforgiving to any retailer who doesn't deliver a seamless digital experience ...

November 21, 2023

Black Friday is a time when consumers can cash in on some of the biggest deals retailers offer all year long ... Nearly two-thirds of consumers utilize a retailer's web and mobile app for holiday shopping, raising the stakes for competitors to provide the best online experience to retain customer loyalty. Perforce's 2023 Black Friday survey sheds light on consumers' expectations this time of year and how developers can properly prepare their applications for increased online traffic ...