Chasing a Moving Target: APM in the Cloud - Part 2
Detection, Analysis and Action
February 21, 2013

Albert Mavashev
Nastel Technologies

Share this

In my last blog, I discussed strategies for dealing with the complexities of monitoring performance in the various stacks that make up a cloud implementation. Here, we will look at ways to detect trends, analyze your data, and act on it.

The first requirement for detecting trends in application performance in the cloud is to have good information delivered in a timely manner about each stack as well as the application.
  
We acquire this information via data collectors that harvest all relevant indicators within the same clock tick. For example: response time, GC activity, memory usage, CPU Usage. Doing this within the same clock tick is called serialization. It is of little use to know I have a failed transaction at time X, but only have CPU and memory data from X minus 10 minutes.

Next, we require a history for each metric. This can be maintained in memory for near real-time analysis, but we also need to use slower storage for longer-term views.

Finally, we apply pattern matching to the data. We might scan and match metrics such as “find all applications whose GC is above High Bollinger Band for 2+ samples.” Doing this in memory can enable very fast detection across a large number of indicators.

Here are three steps you can use to detect performance trends

1. Measure the relevant application performance indicators on the business side such as orders filled, failed or missed. And then, measure the ones on the IT side such as  JVM GC activity, memory, I/O rates.

2. Create a base line for each relevant indicator. This could a 1- to60-second sampling for near real-time monitoring. In addition set up a 1-, 10- and 15-minute sample or even daily, weekly or monthly for those longer in duration. You need both.

3. Apply analytics to determine trends and behavior

Keeping it Simple

Applying analytics can be easier than you expect. In fact, the more simple you keep it, the better.

The following three simple analytical techniques can be used in order to detect anomalies:

1. Bollinger Bands – 2 standard deviations off the mean – low and high. The normal is 2 standard deviations from the mean.

2. Percent of Change – This means comparing sample to sample, day to day or week to week, and calculating the percentage of change.

3. Velocity – Essentially this measures how fast indicators are changing. For example, you might be measuring response time and it drops from 10 to 20 seconds over a five-second interval or (20-10)/5 = 2 units/sec. With this technique, we are expecting a certain amount of change; however, when the amount of change is changing at an abnormal rate, we have most likely detected an anomaly.

Now That You Know ... Act On It

After the analysis, the next activity is to take action. This could be alerts, notification or system actions such as restarting processes or even resubmitting orders. Here, we are connecting the dots between IT and the business and alerting the appropriate owners. 

And In Conclusion

Elastic cloud-based applications can’t be monitored effectively using static models, as these models assume constancy. And the one thing constant about these applications is their volatility. In these environments, what was abnormal yesterday might likely be normal today. As a result, what static models indicate may be wrong. 

However, using a methodology comprised of gathering both business and IT metrics, creating automated base lines and applying analytics to them in real time can produce effective results and predict behavior. 

Albert Mavashev is Chief Technology Officer at Nastel Technologies.

Share this

The Latest

May 27, 2020

Keeping networks operational is critical for businesses to run smoothly. The Ponemon Institute estimates that the average cost of an unplanned network outage is $8,850 per minute, a staggering number. In addition to cost, a network failure has a negative effect on application efficiency and user experience ...

May 26, 2020

Nearly 3,700 people told GitLab about their DevOps journeys. Respondents shared that their roles are changing dramatically, no matter where they sit in the organization. The lines surrounding the traditional definitions of dev, sec, ops and test have blurred, and as we enter the second half of 2020, it is perhaps more important than ever for companies to understand how these roles are evolving ...

May 21, 2020

As cloud computing continues to grow, tech pros say they are increasingly prioritizing areas like hybrid infrastructure management, application performance management (APM), and security management to optimize delivery for the organizations they serve, according to SolarWinds IT Trends Report 2020: The Universal Language of IT ...

May 20, 2020

Businesses see digital experience as a growing priority and a key to their success, with execution requiring a more integrated approach across development, IT and business users, according to Digital Experiences: Where the Industry Stands ...

May 19, 2020

Fully 90% of those who use observability tooling say those tools are important to their team's software development success, including 39% who say observability tools are very important ...

May 18, 2020

As our production application systems continuously increase in complexity, the challenges of understanding, debugging, and improving them keep growing by orders of magnitude. The practice of Observability addresses both the social and the technological challenges of wrangling complexity and working toward achieving production excellence. New research shows how observable systems and practices are changing the APM landscape ...

May 14, 2020
Digital technologies have enveloped our lives like never before. Be it on the personal or professional front, we have become dependent on the accurate functioning of digital devices and the software running them. The performance of the software is critical in running the components and levers of the new digital ecosystem. And to ensure our digital ecosystem delivers the required outcomes, a robust performance testing strategy should be instituted ...
May 13, 2020

The enforced change to working from home (WFH) has had a massive impact on businesses, not just in the way they manage their employees and IT systems. As the COVID-19 pandemic progresses, enterprise IT teams are looking to answer key questions such as: Which applications have become more critical for working from home? ...

May 12, 2020

In ancient times — February 2020 — EMA research found that more than 50% of IT leaders surveyed were considering new ITSM platforms in the near future. The future arrived with a bang as IT organizations turbo-pivoted to deliver and support unprecedented levels and types of services to a global workplace suddenly working from home ...

May 11, 2020

The Internet of Things (IoT) is changing the world. From augmented reality advanced analytics to new consumer solutions, IoT and the cloud are together redefining both how we work and how we engage with our audiences. They are changing how we live, as well ...