Automated Analytics: The Third-Dimension of Application Performance Problem Solving
September 05, 2013
Jason Meserve
Share this

It doesn’t seem all that long ago that one would arrive at the office in the morning, find that the email system or web site was down and call IT to let them know. Sadly, that call would be the first notification IT had to check to see if the reported system was indeed down.

That scenario is the first level of application performance analytics. It isn’t very proactive or smart and can lead to a lot of frustrated users. In 2013, if the first notice of an outage is coming from an employee or worse, a customer, then IT needs to seriously investigate a new solution for alerting to problems. With the competition a click away and razor thin margins, businesses today can’t afford slowdowns and outages, never mind one that requires an end user to report it.

This is why Application Performance Management (APM) systems were developed. To give IT a way of easily seeing problem spots in complex applications and drilling down into the varied layers of the application to find root cause. The majority of today’s APM solutions accomplish this through setting thresholds and baselines (automatically or manually) and alerting when those lines in the sand are approached or crossed. This approach is great for alerting to extreme behavior and lighting up the red, yellow and green lights on an IT operator’s dashboard.

Dashboards are important to Operations. If you’re responsible for a complex system, it helps to watch for extreme measurements on each component. In practice, however, although managing the components for extreme behavior helps, this never proves to be sufficient in keeping the system healthy or in restoring health to the system when it degrades or fails. Components interact with other components. Those interactions can be very important to the overall system, even when no extreme behavior is evident on any one component.

Consider an analogy. If a sick patient seeks care from three different specialists (each responsible for the health of one component of the system) and each specialist prescribes medication without considering the actions of the other specialists, then the interaction of the drugs can cause serious harm to the patient (i.e., the system) even though no single drug is prescribed in excess or would cause any ill effects alone.

In a similar manner, management of IT components in isolation, without consideration of the IT system as a whole and the interactions between all the components, is known to result in poor overall performance, more outages, and slower recovery times.

Let’s focus on an important fact: It’s very expensive to have an outage. “The most recent Enterprise Management Associates (EMA) research finds that for 25% of companies surveyed, an hour of downtime costs the business between $100,000 and $500,000. Another 29% report the cost of downtime to be between $75,000 and $100,000,” according to research published by EMA. And that’s just the bottom line cost. What about customer loyalty and brand reputation? Damage those too badly and the company may never recover.

A Third Wave of Analytics

There’s a new, third wave of smarter, more sophisticated analytics hitting the APM market; these solutions are designed to help shorten the duration of outages and possibly prevent them by giving application operators earlier warnings of problems brewing beneath the surface. A recent APM Digest Q&A with Netuitive’s Nicola Sanna touched on the importance of having machine-driven analytics.

Today’s advanced analytical engines allow the IT practitioner to rise above the level of component management and practice a more efficient and effective form of systems management. Such an engine does not require thresholding, baselining or configuring for any specific application. Instead, the engine consumes raw data and then learns metric, component, and system behavioral patterns on its own. This means the engine learns from observation the difference between normal and abnormal behavior, not at the metric level, not at the component level, but at the systems level.

Sophisticated analytic engines use multivariate anomaly detection to find intervals of time when groups of metrics or application components are interacting with each other in a manner not consistent with the historical patterns. Visualization and analysis of the patterns from such groups of metrics during an abnormal interval reveals where impactful change occurred across multiple components, when change occurred and the scope of the impact across multiple components. This provides a new type of insight not revealed by the other types of APM analysis. In most cases it can either reveal root causes or at least clues about root causes, including relationships the application operator would not have otherwise known.

This achievement of systems management over component management does not work if configuration is required. Neither the operator nor the administrator can be expected to know in advance the interactions which occur in a complex system. They cannot possibly construct rules, thresholds, and dashboards sufficient for capturing relationships they don’t even know about. Nor could they possibly maintain proper configuration over time as change occurs throughout the system. Fortunately, analytics technology has advanced to the point that zero-configuration monitoring and analysis systems are feasible.

Having automated analytics built right into the APM workflow can help application operators discover the source of problems in complex applications more quickly as they do not have to switch between various systems when problems arise. Making cutting-edge analytics part of the everyday APM environment can make IT operators more efficient, helping to reduce the time associated with outages and slowdowns.

This type of analysis harnesses the Big Data created by APM systems and delivers value. As APM monitors collect performance data from thousands of nodes every 15 seconds, the amount of metrics being processed by an APM system quickly adds up. This data is already used for extreme alerting via thresholds which color traffic lights on dashboards, flow maps, and Top-N views. Now it’s possible to augment this component-centric, extreme-behavior-centric approach with machine-driven analytics that enable systems management by mining big data for potential problems, making those millions (or, in some cases, billions) of metrics even more valuable.

With IT staffs spread thin, growing application complexity and increased user demand and expectations, application owners and operators need every insight possible into the performance of critical systems. Add advanced, automated analytics, the must-have next step in delivering that insight, to complement your existing alerts and give your team that critical edge they need to deliver business service reliability.

ABOUT Jason Meserve

Jason Meserve has been working in high-tech for over 15 years, and is currently a Product Marketing Manager at CA Technologies where he focuses on Service Assurance solutions such as Application Performance Management. He built his tech resume in the 10 years he spent as a journalist at Network World, where he created everything from articles, features, blogs, videos and podcasts. Meserve has also held marketing and editorial positions at Constant Contact and Application Development Trends.

Share this

The Latest

August 12, 2022

The development of the Thousand Brains Theory of Intelligence framework will now serve as a foundation for further research and new developments in Artificial Intelligence (AI) and Machine Learning (ML) ...

August 11, 2022

IT teams feel overwhelmed by too many tools that do not provide a unified view of the entire IT infrastructure, according to The Shift to Unified Observability: Reasons, Requirements, and Returns, a new independent survey conducted by IDC in collaboration with Riverbed ...

August 10, 2022

Legacy systems require a great deal of a prior knowledge, and then significant configuration, for anomaly detection to work effectively. ML and AI are beginning to change that, but it's important to really validate the claims of any NPM solution ...

August 09, 2022

Successful insight into the performance of a company's networks starts with effective network performance management (NPM) tools. However, with the plethora of options it can be overwhelming for IT teams to choose the right one. Here are 10 essential questions to ask before selecting an NPM tool ...

August 08, 2022

Hybrid and remote work environments have been growing significantly in the past few years. As individuals move away from traditional office settings in today's new remote and hybrid environments, many operational issues such as poor visibility into asset status and refreshes, unaccounted assets, and overspending on software are becoming a bigger challenge for IT departments ...

August 05, 2022

MLOps or Machine Learning Operations are a combination of best processes and practices that businesses use to run AI successfully ... While it is a relatively new field, MLOps is a collective effort that captured the interest of data scientists, DevOps engineers, AI enthusiasts, and IT ...

August 04, 2022

The data is in: enterprises are not happy with their managed service providers (MSPs) and cloud service providers (CSPs). According to the latest CloudBolt Industry Insights report, Filling the Gap: Service Providers' Increasingly Important Role in Multi-Cloud Success, 80% are so unsatisfied with their existing MSP and/or CSP, they are actively looking to replace them within 12 months ...

August 03, 2022

The last two years have accelerated massive changes in how we work, do business, and engage with customers. According to Pega research, nearly three out of four employees (71%) feel their job complexity continues to rise as customer demands increase, and employees at all levels feel overloaded with information, systems, and processes that make it difficult to adapt to these new challenges and meet their customers' growing needs ...

August 02, 2022

Investing in employees will always be smart business. And right now, investing in employees means giving people the resources — and ability — to optimize performance ... For pretty much every company, that means delivering the digital tools necessary to facilitate seamless, secure, user-friendly access and connectivity ...

August 01, 2022

Digital transformation can be the difference between becoming the next Netflix and becoming the next Blockbuster Video. With corporate survival on the line, "digital transformation" is no longer merely an impressive buzzword to throw around in boardrooms. It's the ticket for entry into the digital era, a fundamental business strategy for every modern company ...