Chasing a Moving Target: APM in the Cloud - Part 2
Detection, Analysis and Action
February 21, 2013

Albert Mavashev
Nastel Technologies

Share this

In my last blog, I discussed strategies for dealing with the complexities of monitoring performance in the various stacks that make up a cloud implementation. Here, we will look at ways to detect trends, analyze your data, and act on it.

The first requirement for detecting trends in application performance in the cloud is to have good information delivered in a timely manner about each stack as well as the application.
  
We acquire this information via data collectors that harvest all relevant indicators within the same clock tick. For example: response time, GC activity, memory usage, CPU Usage. Doing this within the same clock tick is called serialization. It is of little use to know I have a failed transaction at time X, but only have CPU and memory data from X minus 10 minutes.

Next, we require a history for each metric. This can be maintained in memory for near real-time analysis, but we also need to use slower storage for longer-term views.

Finally, we apply pattern matching to the data. We might scan and match metrics such as “find all applications whose GC is above High Bollinger Band for 2+ samples.” Doing this in memory can enable very fast detection across a large number of indicators.

Here are three steps you can use to detect performance trends

1. Measure the relevant application performance indicators on the business side such as orders filled, failed or missed. And then, measure the ones on the IT side such as  JVM GC activity, memory, I/O rates.

2. Create a base line for each relevant indicator. This could a 1- to60-second sampling for near real-time monitoring. In addition set up a 1-, 10- and 15-minute sample or even daily, weekly or monthly for those longer in duration. You need both.

3. Apply analytics to determine trends and behavior

Keeping it Simple

Applying analytics can be easier than you expect. In fact, the more simple you keep it, the better.

The following three simple analytical techniques can be used in order to detect anomalies:

1. Bollinger Bands – 2 standard deviations off the mean – low and high. The normal is 2 standard deviations from the mean.

2. Percent of Change – This means comparing sample to sample, day to day or week to week, and calculating the percentage of change.

3. Velocity – Essentially this measures how fast indicators are changing. For example, you might be measuring response time and it drops from 10 to 20 seconds over a five-second interval or (20-10)/5 = 2 units/sec. With this technique, we are expecting a certain amount of change; however, when the amount of change is changing at an abnormal rate, we have most likely detected an anomaly.

Now That You Know ... Act On It

After the analysis, the next activity is to take action. This could be alerts, notification or system actions such as restarting processes or even resubmitting orders. Here, we are connecting the dots between IT and the business and alerting the appropriate owners. 

And In Conclusion

Elastic cloud-based applications can’t be monitored effectively using static models, as these models assume constancy. And the one thing constant about these applications is their volatility. In these environments, what was abnormal yesterday might likely be normal today. As a result, what static models indicate may be wrong. 

However, using a methodology comprised of gathering both business and IT metrics, creating automated base lines and applying analytics to them in real time can produce effective results and predict behavior. 

Albert Mavashev is Chief Technology Officer at Nastel Technologies.

Share this

The Latest

May 20, 2019

In today's competitive landscape, businesses must have the ability and process in place to face new challenges and find ways to successfully tackle them in a proactive manner. For years, this has been placed on the shoulders of DevOps teams within IT departments. But, as automation takes over manual intervention to increase speed and efficiency, these teams are facing what we know as IT digitization. How has this changed the way companies function over the years, and what do we have to look forward to in the coming years? ...

May 16, 2019

Although the vast majority of IT organizations have implemented a broad variety of systems and tools to modernize, simplify and streamline data center operations, many are still burdened by inefficiencies, security risks and performance gaps in their IT infrastructure as well as the excessive time it takes to manage legacy infrastructure, according to the State of IT Transformation, a report from Datrium ...

May 15, 2019

When it comes to network visibility, there are a lot of discussions about packet broker technology and the various features these solutions provide to network architects and IT managers. Packet brokers allow organizations to aggregate the data required for a variety of monitoring solutions including network performance monitoring and diagnostic (NPMD) platforms and unified threat management (UTM) appliances. But, when it comes to ensuring these solutions provide the insights required by NetOps and security teams, IT can spend an exorbitant amount of time dealing with issues around adds, moves and changes. This can have a dramatic impact on budgets and tool availability. Why does this happen? ...

May 14, 2019

Data may be pouring into enterprises but IT professionals still find most of it stuck in siloed departments and weeks away from being able to drive any valued action. Coupled with the ongoing concerns over security responsiveness, IT teams have to push aside other important performance-oriented data in order to ensure security data, at least, gets prominent attention. A new survey by Ivanti shows the disconnect between enterprise departments struggling to improve operations like automation while being challenged with a siloed structure and a data onslaught ...

May 13, 2019

A subtle, deliberate shift has occurred within the software industry which, at present, only the most innovative organizations have seized upon for competitive advantage. Although primarily driven by Artificial Intelligence (AI), this transformation strikes at the core of the most pervasive IT resources including cloud computing and predictive analytics ...

May 09, 2019

When asked who is mandated with developing and delivering their organization's digital competencies, 51% of respondents say their IT departments have a leadership role. The critical question is whether IT departments are prepared to take on a leadership role in which collaborating with other functions and disseminating knowledge and digital performance data are requirements ...

May 08, 2019

The Economist Intelligence Unit just released a new study commissioned by Riverbed that explores nine digital competencies that help organizations improve their digital performance and, ultimately, achieve their objectives. Here's a brief summary of 7 key research findings you'll find covered in detail in the report ...

May 07, 2019

Today, the overall customer scenario has digitally transformed and practically there is no limitation to the ways in which the target customers can be reached. These opportunities are throwing multiple challenges for brands and enterprises, and one of the prominent ones is to ensure Omni Channel experience for customers ...

May 06, 2019

Most businesses (92 percent of respondents) see the potential value of data and 36 percent are already monetizing their data, according to the Global Data Protection Index from Dell EMC. While this acknowledgement is positive, however, most respondents are struggling to properly protect their data ...

May 02, 2019

IT practitioners are still in experimentation mode with artificial intelligence in many cases, and still have concerns about how credible the technology can be. A recent study from OpsRamp targeted these IT managers who have implemented AIOps, and among other data, reports on the primary concerns of this new approach to operations management ...