10 APM Capabilities Every IT Manager Should Have
April 19, 2012
Irad Deutsch
Share this

One of the common questions that every IT manager asks on a regular basis is, “Why is my application so slow today when everything was fine yesterday?” Application Performance Management (APM) is the only way to truly answer that question, and it is one of the must-have tools for every IT manager.

With this APM imperative in mind, the following are 10 capabilities every IT manager should look for when choosing an APM solution:

1. Real-time monitoring

Real-time monitoring is a must. When digging into a problem, tracking events in real-time as they occur is by far more effective than doing so via “post-mortem” analysis. There are many APM vendors that claim to provide real-time monitoring but sometimes they really mean “near real-time”, with delays from 30 seconds to five minutes, typically. This restricts your ability to analyze and react to events in real-time. Make sure real-time is truly real-time. Real-time monitoring should provide you with important metrics such as: Who is doing what, how much resources are being taken, and who is affecting who right now?

2. Rich data repository

Sometimes you get lucky and witness a problem in real-time. But in most cases, this doesn’t happen. This is why a good APM solution must be able to collect all transaction activity and performance metrics into a rich, but light-weight repository.

3. “Single anomaly” granularity

Some APM vendors store the statistics they gather but they aggregate it to save disk space or because they just can’t handle too much data in a reasonable amount of time. Analyzing performance incidents based on aggregated data is similar to assessing a book by reading only its rear cover. You get the general idea but you have no ability to understand what really happened. That’s why good APM solutions must give you all of the granular information including individual transactions and their characteristics, resource consumption, traffic order (chain of events) etc.

4. Measuring Quality of Service (QoS) and Service Level Agreements (SLAs)

APM solutions are designed to improve the end user experience. Improving user experience starts by measuring it and identifying QoS and SLA anomalies. Only then can you make informed decisions and take action. You should also have the ability to compare user experience before and after a change is applied to your systems.

5. Performance proactivity – enforcing QoS and SLA

Some APM solutions enable users to analyze performance data and identify root problems retroactively, but do nothing to enable real-time resolution of performance issues. Because these solutions are fundamentally passive by nature, you have no choice but to wait for application performance to nosedive before corrective action can be taken. And in these cases, the wait time from issue identification to resolution can be hours or even days. Avoiding QoS problems can be achieved only if you take proactive steps. Proactive APM solution can turn this: “I got a text message at 2:00AM from our APM tool that indicated that we had a QoS problem so I logged into the system and solved it,” into: “I got a text message at 8:00 AM from our APM tool letting me know that at 1:50 AM a QoS problem was about to occur and it took care of it automatically.” Being proactivite can be achieved in many ways: by activating automatic scripts, managing system resources, and triggering third party tools, etc.

6. Detecting bottlenecks and root cause analysis

If an APM tool only notifies you that you ran out of system resources because of job X, then you don’t really have root cause analysis capabilities. Root cause analysis is when your APM tool tells you that this job usually runs at 8:00 PM but because of problem on a secondary system, it has started 1 hour later and collided with another job that was scheduled to run at the same time. APM tools must do the hard work of correlating many little pieces of data so that you can get to the source of the problem. Otherwise you will find yourself trying to assemble a 1,000 piece puzzle while your CEO knocks on your door every 5 minutes looking for answers.

7. Chain reaction analysis

Analyzing a problem can take many shapes. The conventional way is by digging into the top-10 hit lists. But those top-10 lists always miss something - the chain of events. Who came first, who came after, “it was all fine until this transaction came in”, etc. Analyzing the chain of events before the system crashed is crucial if you wish to avoid this problem in the future. An APM tool should give you the ability to travel back in time and look into the granular metrics second by second as if you were watching a movie in slow motion. This is possible only if the APM tool collects data at a very high level of granularity and does not lose it over time (i.e. it retains the raw collected metrics).

8. Performance comparisons

There are two main performance troubleshooting approaches that an APM tool should support. Performance drill downs to a specific period of time, and performance comparison. If you have a performance problem now, but all was fine yesterday, you must assume that something has changed. Hunting for those changes will lead you to the root cause much quicker than a conventional drill down into the current problem's performance metrics. You should have the ability to answer questions like these in seconds: “Is this new storage system I just implemented faster than the old one we had?” and “why is it working very well in QA but not in production?” If your APM tool collects and stores raw performance metrics, by comparing those metrics you can easily answer all these questions and dramatically shorten your mean time to recovery.

9. Business Intelligence-like dashboard

When an APM tool stores millions of pieces of raw (and aggregated) data, it should also deliver a convenient way to slice and dice this data. Some APM tools will decide for you the best way to process this data by providing a pre-defined set of graph and report templates. A good APM tool will let you decide how you want to slice and dice this data by giving you a flexible and easy to use BI-like dashboard where you can drag and drop dimensions and drill down by double clicking in order to answer questions like, “What user consumed most of my CPU and what is the top program he/she has been using that caused the most impact?”

10.Charge back capability

Bad performance usually starts with bad design or bad coding and very rarely stems from hardware faults. If a developer writes a poor piece of code, the IT division needs to spend more money on hardware or software licenses to deal with it. This is why it’s becoming popular in many organizations to turn this dynamic upside down - here the annual budgets are distributed between the application development divisions that use this money to buy IT services from their IT division. If they write poor code they ultimately need to pay more. This is workable only if the IT department has an APM tool that can measure and enforce resources usage by ‘tenant’. This approach has proven to be effective in helping companies reduce their IT budget quite significantly.

ABOUT Irad Deutsch

Irad Deutsch is a CTO at Veracity group, an international software infrastructure integrator. Irad is also the CTO of MORE IT Resources - MoreVRP, a provider of application and database performance optimization solutions.

Related Links:


Share this

The Latest

February 29, 2024

Despite the growth in popularity of artificial intelligence (AI) and ML across a number of industries, there is still a huge amount of unrealized potential, with many businesses playing catch-up and still planning how ML solutions can best facilitate processes. Further progression could be limited without investment in specialized technical teams to drive development and integration ...

February 28, 2024

With over 200 streaming services to choose from, including multiple platforms featuring similar types of entertainment, users have little incentive to remain loyal to any given platform if it exhibits performance issues. Big names in streaming like Hulu, Amazon Prime and HBO Max invest thousands of hours into engineering observability and closed-loop monitoring to combat infrastructure and application issues, but smaller platforms struggle to remain competitive without access to the same resources ...

February 27, 2024

Generative AI has recently experienced unprecedented dramatic growth, making it one of the most exciting transformations the tech industry has seen in some time. However, this growth also poses a challenge for tech leaders who will be expected to deliver on the promise of new technology. In 2024, delivering tangible outcomes that meet the potential of AI, and setting up incubator projects for the future will be key tasks ...

February 26, 2024

SAP is a tool for automating business processes. Managing SAP solutions, especially with the shift to the cloud-based S/4HANA platform, can be intricate. To explore the concerns of SAP users during operational transformations and automation, a survey was conducted in mid-2023 by Digitate and Americas' SAP Users' Group ...

February 22, 2024

Some companies are just starting to dip their toes into developing AI capabilities, while (few) others can claim they have built a truly AI-first product. Regardless of where a company is on the AI journey, leaders must understand what it means to build every aspect of their product with AI in mind ...

February 21, 2024

Generative AI will usher in advantages within various industries. However, the technology is still nascent, and according to the recent Dynatrace survey there are many challenges and risks that organizations need to overcome to use this technology effectively ...

February 20, 2024

In today's digital era, monitoring and observability are indispensable in software and application development. Their efficacy lies in empowering developers to swiftly identify and address issues, enhance performance, and deliver flawless user experiences. Achieving these objectives requires meticulous planning, strategic implementation, and consistent ongoing maintenance. In this blog, we're sharing our five best practices to fortify your approach to application performance monitoring (APM) and observability ...

February 16, 2024

In MEAN TIME TO INSIGHT Episode 3, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses network security with Chris Steffen, VP of Research Covering Information Security, Risk, and Compliance Management at EMA ...

February 15, 2024

In a time where we're constantly bombarded with new buzzwords and technological advancements, it can be challenging for businesses to determine what is real, what is useful, and what they truly need. Over the years, we've witnessed the rise and fall of various tech trends, such as the promises (and fears) of AI becoming sentient and replacing humans to the declaration that data is the new oil. At the end of the day, one fundamental question remains: How can companies navigate through the tech buzz and make informed decisions for their future? ...

February 14, 2024

We increasingly see companies using their observability data to support security use cases. It's not entirely surprising given the challenges that organizations have with legacy SIEMs. We wanted to dig into this evolving intersection of security and observability, so we surveyed 500 security professionals — 40% of whom were either CISOs or CSOs — for our inaugural State of Security Observability report ...