4 Key Resources to Monitor in the Cloud
October 16, 2011
Roger Ruttiman
Share this

Good application performance monitoring in the cloud involves repeatedly monitoring and testing a few key areas that act differently in most cloud environments than they do in traditional situations. Tracking the resulting values over time allows you to track normal usage patterns and trends, and determine normal behavior for your provider's resources.

Valuable resources to monitor in the cloud include:

1. Network Latency

If your application depends on access to a network resource, like DNS for reverse lookup of domain names for example, then the application should regularly test this resource and your monitoring system should record its results in an easily visualized format. Also, the access time to the hosts application from both cloud and non-cloud locations should be checked and tracked. This will allow differential latency comparisons that will help reduce uncertainty about the root cause of slow response time. For instance, if the application is fast from within the cloud, and slow from without, is there a network issue on the cloud provider's Internet facing systems?

2. Cloud API Feature Availability

If your application is dynamic, and needs to use features of the Cloud vendor's API to function, you should script and test those functions to ensure they are available, and that they perform fast enough to meet your needs. Functions like instance launching, taking a volume snapshot, or adding a new volume to a running instance are good things to test periodically.

3. Virtualization Overhead

Differential monitoring of instances in the cloud versus instances on actual hardware can help you determine overall virtualization overhead for your application. Knowing the relative performance will help you size the instances you launch, and let you calculate the cost of operation on cloud infrastructure versus in-house. This makes cost-benefit analysis and cost-based justification for using cloud systems possible.

4. Configuration Tracking

So many of the failures experienced by computing infrastructures are the result of improperly managed configuration changes. The knowledge of the last time a configuration was changed becomes a critical piece of information in root cause analysis. At a minimum, the monitoring system should have a record of boot time (often associated with updates or other configuration changes) and ideally it will also have some indication of the nature of the change.

While moving to the cloud can be cost-effective in the abstract, as with any technology project it’s important to validate the assumptions you make when determining what to move, and what the cost savings actually end up to be.

About Roger Ruttiman

Roger Ruttiman, VP of Engineering & Quality at GroundWork, has 18 years of software development and leadership experience. Ruttiman is the lead architect responsible for product architecture, building and managing local and offshore teams. Before joining GroundWork, Ruttiman was a lead engineer at Advent Software in San Francisco, and at Autodesk in the US and Europe.

Share this

The Latest

September 29, 2020

More than 80% of organizations have experienced a significant increase in pressure on digital services since the start of the COVID-19 pandemic, according to a new study conducted by PagerDuty ...

September 28, 2020

In Episode 9, Sean McDermott, President, CEO and Founder of Windward Consulting Group, joins the AI+ITOPS Podcast to discuss how the pandemic has impacted IT and is driving the need for AIOps ...

September 25, 2020

Michael Olson on the AI+ITOPS Podcast: "I really see AIOps as being a core requirement for observability because it ... applies intelligence to your telemetry data and your incident data ... to potentially predict problems before they happen."

September 24, 2020

Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams. It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability ...

September 23, 2020

The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...

September 22, 2020

All application traffic travels across the network. While application performance management tools can offer insight into how critical applications are functioning, they do not provide visibility into the broader network environment. In order to optimize application performance, you need a few key capabilities. Let's explore three steps that can help NetOps teams better support the critical applications upon which your business depends ...

September 21, 2020

In Episode 8, Michael Olson, Director of Product Marketing at New Relic, joins the AI+ITOPS Podcast to discuss how AIOps provides real benefits to IT teams ...

September 18, 2020

Will Cappelli on the AI+ITOPS Podcast: "I'll predict that in 5 years time, APM as we know it will have been completely mutated into an observability plus dynamic analytics capability."

September 17, 2020
One of the benefits of doing the EMA Radar Report: AIOps- A Guide for Investing in Innovation was getting data from all 17 vendors on critical areas ranging from deployment and adoption challenges, to cost and pricing, to architectural and functionality insights across everything from heuristics, to automation, and data assimilation ...
September 16, 2020

When you consider that the average end-user interacts with at least 8 applications, then think about how important those applications are in the overall success of the business and how often the interface between the application and the hardware needs to be updated, it's a potential minefield for business operations. Any single update could explode in your face at any time ...