Good application performance monitoring in the cloud involves repeatedly monitoring and testing a few key areas that act differently in most cloud environments than they do in traditional situations. Tracking the resulting values over time allows you to track normal usage patterns and trends, and determine normal behavior for your provider's resources.
Valuable resources to monitor in the cloud include:
1. Network Latency
If your application depends on access to a network resource, like DNS for reverse lookup of domain names for example, then the application should regularly test this resource and your monitoring system should record its results in an easily visualized format. Also, the access time to the hosts application from both cloud and non-cloud locations should be checked and tracked. This will allow differential latency comparisons that will help reduce uncertainty about the root cause of slow response time. For instance, if the application is fast from within the cloud, and slow from without, is there a network issue on the cloud provider's Internet facing systems?
2. Cloud API Feature Availability
If your application is dynamic, and needs to use features of the Cloud vendor's API to function, you should script and test those functions to ensure they are available, and that they perform fast enough to meet your needs. Functions like instance launching, taking a volume snapshot, or adding a new volume to a running instance are good things to test periodically.
3. Virtualization Overhead
Differential monitoring of instances in the cloud versus instances on actual hardware can help you determine overall virtualization overhead for your application. Knowing the relative performance will help you size the instances you launch, and let you calculate the cost of operation on cloud infrastructure versus in-house. This makes cost-benefit analysis and cost-based justification for using cloud systems possible.
4. Configuration Tracking
So many of the failures experienced by computing infrastructures are the result of improperly managed configuration changes. The knowledge of the last time a configuration was changed becomes a critical piece of information in root cause analysis. At a minimum, the monitoring system should have a record of boot time (often associated with updates or other configuration changes) and ideally it will also have some indication of the nature of the change.
While moving to the cloud can be cost-effective in the abstract, as with any technology project it’s important to validate the assumptions you make when determining what to move, and what the cost savings actually end up to be.
About Roger Ruttiman
Roger Ruttiman, VP of Engineering & Quality at GroundWork, has 18 years of software development and leadership experience. Ruttiman is the lead architect responsible for product architecture, building and managing local and offshore teams. Before joining GroundWork, Ruttiman was a lead engineer at Advent Software in San Francisco, and at Autodesk in the US and Europe.