4 Best Practices for APM in the Cloud
January 30, 2014
Josh Stephens
Share this

Moving your applications to the cloud has undeniable benefits. The cloud offers dynamic environments where you can spin up instances quickly, only consume what you need, and eliminate the costs of renting or purchasing expensive servers.

But the cloud also means developers are building software on a platform they don't "own", where things can change in an instant. There is no real-time insight into the performance and health of the infrastructure, to say nothing of the applications that run on it.

When things go wrong, it's hard to know where the bottlenecks are. Is the application consuming too much CPU? Is an unresponsive API causing it to time out? Is network latency degrading application performance? How do you know what you don't know?

These issues eat into your bottom line and negatively impact customer satisfaction. And while you can't anticipate every potential problem, you can be prepared to avoid many of them.

Here are four best practices to optimize your application performance monitoring:

1. Collaboration

Invite other team members to participate in your APM tool. You can set role-based access controls that grant edit privileges to admins, or restrict read-only users to see only systems with a certain tag. This gives users access to the features and data they need to do their jobs, without distracting them with superfluous information.

Some products offer deep URLs to facilitate information sharing across teams, and also allow you to annotate important events that correlate to subsequent performance changes.

2. Tagging

Alerts defined by tags allow you to customize how, when and why you're being notified about the performance of your applications. For instance, if you have a set of front-end application servers behind a load balancer, you may want to tag them "frontend" and create a unique alert for each set of performance metrics that you want to monitor.

When used correctly, include and exclude tags, process alerting and threshold alerts can help users react quickly and efficiently to solve performance issues.

3. Automation

The self-service, automated provisioning of IT resources has come of age and is rapidly becoming ubiquitous. Chef and Puppet are two popular orchestration engines that allow you to spin servers up and down in response to evolving business needs. Your APM solution should integrate tightly with existing automation tools, modifying automatically in response to operating environment changes, and providing real-time visibility into all your Chef- and Puppet-deployed applications and services.

4. Custom metrics

No two businesses have the same goals and objectives. The same can be said for application performance monitoring metrics. Custom metrics provide enhanced visibility into specific areas of an application where you want to collect, view or analyze additional information – such as page load time, web transaction response time or database query execution time. Setting these up should be as easy as modifying a simple script and creating a custom dashboard to display your data.

ABOUT Josh Stephens

As Vice President of Product Strategy at Idera, Josh Stephens brings nearly 20 years of experience in the technology industry. Prior to Idera, he founded a consulting and technology company focused on helping companies adapt their product and go to market strategies to take advantage of the high velocity, inside sales model focused around inbound marketing and social media. Previously, he was VP of technology at SolarWinds where he spent more than a dozen years helping to define and innovate their product and go to market strategies. Earlier in his career, Stephens spent time at Greenwich Technology Partners, International Network Services, and the United States Air Force.

Share this

The Latest

July 25, 2024

The 2024 State of the Data Center Report from CoreSite shows that although C-suite confidence in the economy remains high, a VUCA (volatile, uncertain, complex, ambiguous) environment has many business leaders proceeding with caution when it comes to their IT and data ecosystems, with an emphasis on cost control and predictability, flexibility and risk management ...

July 24, 2024

In June, New Relic published the State of Observability for Energy and Utilities Report to share insights, analysis, and data on the impact of full-stack observability software in energy and utilities organizations' service capabilities. Here are eight key takeaways from the report ...

July 23, 2024

The rapid rise of generative AI (GenAI) has caught everyone's attention, leaving many to wonder if the technology's impact will live up to the immense hype. A recent survey by Alteryx provides valuable insights into the current state of GenAI adoption, revealing a shift from inflated expectations to tangible value realization across enterprises ... Here are five key takeaways that underscore GenAI's progression from hype to real-world impact ...

July 22, 2024
A defective software update caused what some experts are calling the largest IT outage in history on Friday, July 19. The impact reverberated through multiple industries around the world ...
July 18, 2024

As software development grows more intricate, the challenge for observability engineers tasked with ensuring optimal system performance becomes more daunting. Current methodologies are struggling to keep pace, with the annual Observability Pulse surveys indicating a rise in Mean Time to Remediation (MTTR). According to this survey, only a small fraction of organizations, around 10%, achieve full observability today. Generative AI, however, promises to significantly move the needle ...

July 17, 2024

While nearly all data leaders surveyed are building generative AI applications, most don't believe their data estate is actually prepared to support them, according to the State of Reliable AI report from Monte Carlo Data ...

July 16, 2024

Enterprises are putting a lot of effort into improving the digital employee experience (DEX), which has become essential to both improving organizational performance and attracting and retaining talented workers. But to date, most efforts to deliver outstanding DEX have focused on people working with laptops, PCs, or thin clients. Employees on the frontlines, using mobile devices to handle logistics ... have been largely overlooked ...

July 15, 2024

The average customer-facing incident takes nearly three hours to resolve (175 minutes) while the estimated cost of downtime is $4,537 per minute, meaning each incident can cost nearly $794,000, according to new research from PagerDuty ...

July 12, 2024

In MEAN TIME TO INSIGHT Episode 8, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses AutoCon with the conference founders Scott Robohn and Chris Grundemann ...

July 11, 2024

Numerous vendors and service providers have recently embraced the NaaS concept, yet there is still no industry consensus on its definition or the types of networks it involves. Furthermore, providers have varied in how they define the NaaS service delivery model. I conducted research for a new report, Network as a Service: Understanding the Cloud Consumption Model in Networking, to refine the concept of NaaS and reduce buyer confusion over what it is and how it can offer value ...