Chasing a Moving Target: APM in the Cloud - Part 2
Detection, Analysis and Action
February 21, 2013

Albert Mavashev
Nastel Technologies

Share this

In my last blog, I discussed strategies for dealing with the complexities of monitoring performance in the various stacks that make up a cloud implementation. Here, we will look at ways to detect trends, analyze your data, and act on it.

The first requirement for detecting trends in application performance in the cloud is to have good information delivered in a timely manner about each stack as well as the application.
  
We acquire this information via data collectors that harvest all relevant indicators within the same clock tick. For example: response time, GC activity, memory usage, CPU Usage. Doing this within the same clock tick is called serialization. It is of little use to know I have a failed transaction at time X, but only have CPU and memory data from X minus 10 minutes.

Next, we require a history for each metric. This can be maintained in memory for near real-time analysis, but we also need to use slower storage for longer-term views.

Finally, we apply pattern matching to the data. We might scan and match metrics such as “find all applications whose GC is above High Bollinger Band for 2+ samples.” Doing this in memory can enable very fast detection across a large number of indicators.

Here are three steps you can use to detect performance trends

1. Measure the relevant application performance indicators on the business side such as orders filled, failed or missed. And then, measure the ones on the IT side such as  JVM GC activity, memory, I/O rates.

2. Create a base line for each relevant indicator. This could a 1- to60-second sampling for near real-time monitoring. In addition set up a 1-, 10- and 15-minute sample or even daily, weekly or monthly for those longer in duration. You need both.

3. Apply analytics to determine trends and behavior

Keeping it Simple

Applying analytics can be easier than you expect. In fact, the more simple you keep it, the better.

The following three simple analytical techniques can be used in order to detect anomalies:

1. Bollinger Bands – 2 standard deviations off the mean – low and high. The normal is 2 standard deviations from the mean.

2. Percent of Change – This means comparing sample to sample, day to day or week to week, and calculating the percentage of change.

3. Velocity – Essentially this measures how fast indicators are changing. For example, you might be measuring response time and it drops from 10 to 20 seconds over a five-second interval or (20-10)/5 = 2 units/sec. With this technique, we are expecting a certain amount of change; however, when the amount of change is changing at an abnormal rate, we have most likely detected an anomaly.

Now That You Know ... Act On It

After the analysis, the next activity is to take action. This could be alerts, notification or system actions such as restarting processes or even resubmitting orders. Here, we are connecting the dots between IT and the business and alerting the appropriate owners. 

And In Conclusion

Elastic cloud-based applications can’t be monitored effectively using static models, as these models assume constancy. And the one thing constant about these applications is their volatility. In these environments, what was abnormal yesterday might likely be normal today. As a result, what static models indicate may be wrong. 

However, using a methodology comprised of gathering both business and IT metrics, creating automated base lines and applying analytics to them in real time can produce effective results and predict behavior. 

Albert Mavashev is Chief Technology Officer at Nastel Technologies.

Share this

The Latest

March 27, 2024

Nearly all (99%) globa IT decision makers, regardless of region or industry, recognize generative AI's (GenAI) transformative potential to influence change within their organizations, according to The Elastic Generative AI Report ...

March 27, 2024

Agent-based approaches to real user monitoring (RUM) simply do not work. If you are pitched to install an "agent" in your mobile or web environments, you should run for the hills ...

March 26, 2024

The world is now all about end-users. This paradigm of focusing on the end-user was simply not true a few years ago, as backend metrics generally revolved around uptime, SLAs, latency, and the like. DevOps teams always pitched and presented the metrics they thought were the most correlated to the end-user experience. But let's be blunt: Unless there was an egregious fire, the correlated metrics were super loose or entirely false ...

March 25, 2024

This year, New Relic published the State of Observability for Financial Services and Insurance Report to share insights derived from the 2023 Observability Forecast on the adoption and business value of observability across the financial services industry (FSI) and insurance sectors. Here are seven key takeaways from the report ...

March 22, 2024

In MEAN TIME TO INSIGHT Episode 4 - Part 2, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and AIOps ...

March 21, 2024

In the course of EMA research over the last twelve years, the message for IT organizations looking to pursue a forward path in AIOps adoption is overall a strongly positive one. The benefits achieved are growing in diversity and value ...

March 20, 2024

Today, as enterprises transcend into a new era of work, surpassing the revolution, they must shift their focus and strategies to thrive in this environment. Here are five key areas that organizations should prioritize to strengthen their foundation and steer themselves through the ever-changing digital world ...

March 19, 2024

If there's one thing we should tame in today's data-driven marketing landscape, this would be data debt, a silent menace threatening to undermine all the trust you've put in the data-driven decisions that guide your strategies. This blog aims to explore the true costs of data debt in marketing operations, offering four actionable strategies to mitigate them through enhanced marketing observability ...

March 18, 2024

Gartner has highlighted the top trends that will impact technology providers in 2024: Generative AI (GenAI) is dominating the technical and product agenda of nearly every tech provider ...

March 15, 2024

In MEAN TIME TO INSIGHT Episode 4 - Part 1, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses artificial intelligence and network management ...