Good application performance monitoring in the cloud involves repeatedly monitoring and testing a few key areas that act differently in most cloud environments than they do in traditional situations. Tracking the resulting values over time allows you to track normal usage patterns and trends, and determine normal behavior for your provider's resources.
Valuable resources to monitor in the cloud include:
1. Network Latency
If your application depends on access to a network resource, like DNS for reverse lookup of domain names for example, then the application should regularly test this resource and your monitoring system should record its results in an easily visualized format. Also, the access time to the hosts application from both cloud and non-cloud locations should be checked and tracked. This will allow differential latency comparisons that will help reduce uncertainty about the root cause of slow response time. For instance, if the application is fast from within the cloud, and slow from without, is there a network issue on the cloud provider's Internet facing systems?
2. Cloud API Feature Availability
If your application is dynamic, and needs to use features of the Cloud vendor's API to function, you should script and test those functions to ensure they are available, and that they perform fast enough to meet your needs. Functions like instance launching, taking a volume snapshot, or adding a new volume to a running instance are good things to test periodically.
3. Virtualization Overhead
Differential monitoring of instances in the cloud versus instances on actual hardware can help you determine overall virtualization overhead for your application. Knowing the relative performance will help you size the instances you launch, and let you calculate the cost of operation on cloud infrastructure versus in-house. This makes cost-benefit analysis and cost-based justification for using cloud systems possible.
4. Configuration Tracking
So many of the failures experienced by computing infrastructures are the result of improperly managed configuration changes. The knowledge of the last time a configuration was changed becomes a critical piece of information in root cause analysis. At a minimum, the monitoring system should have a record of boot time (often associated with updates or other configuration changes) and ideally it will also have some indication of the nature of the change.
While moving to the cloud can be cost-effective in the abstract, as with any technology project it’s important to validate the assumptions you make when determining what to move, and what the cost savings actually end up to be.
About Roger Ruttiman
Roger Ruttiman, VP of Engineering & Quality at GroundWork, has 18 years of software development and leadership experience. Ruttiman is the lead architect responsible for product architecture, building and managing local and offshore teams. Before joining GroundWork, Ruttiman was a lead engineer at Advent Software in San Francisco, and at Autodesk in the US and Europe.
The Latest
Most organizations suffer from some form of alert noise. Alert noise is only going to increase as organizations support cloud-native applications spanning multiple public and private clouds, including ephemeral deployments and more. It's not going to get easier for organizations to understand the signal from all those alerts being sent. So what can be done about it? ...
This blog presents the case for a radical new approach to basic information technology (IT) education. This conclusion is based on a study of courses and other forms of IT education which purport to cover IT "fundamentals" ...
To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...
APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...
CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...
Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...
Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...
Managing availability and performance within SAP environments has long been a challenge for IT teams. But as IT environments grow more complex and dynamic, and the speed of innovation in almost every industry continues to accelerate, this situation is becoming a whole lot worse ...
Harnessing the power of network-derived intelligence and insights is critical in detecting today's increasingly sophisticated security threats across hybrid and multi-cloud infrastructure, according to a new research study from IDC ...
Recent research suggests that many organizations are paying for more software than they need. If organizations are looking to reduce IT spend, leaders should take a closer look at the tools being offered to employees, as not all software is essential ...