APM and ITOA: Clearing Up the Confusion
April 11, 2016

Guy Warren
ITRS Group

Share this

I was reading a discussion on a social media site about Application Performance Management, and realized that there is a lot of confusion about what is Application Performance Monitoring, Application Performance Management (APM) and IT Operational Analytics (ITOA).

Just looking at the words used, you would believe that Application Performance Monitoring is focused on watching data and monitoring it for a particular condition or state. Application Performance Management would lead you to believe that this is a wider field which includes a range of techniques to certainly monitor the application, but also to manage other aspects of the IT estate. The degree to which complex analytics are used is unclear, but potentially IT Operational Analytics could be seen as a subset of Application Performance Management, although the focus on application might make it more limited in its scope than ITOA.

To help clarify this rather muddy set of terms, we use two models which we find are much clearer and logical, and have less ambiguity than the APM and ITOA definitions.

The Monitoring Maturity Model

The first model we call the Monitoring Maturity Model, because it is a layered model where generally the higher levels are based on data collected from the lower levels. The model is:

1. Infrastructure Monitoring: Collection data on the servers, operating systems, network and storage and setting rule based alerts to catch potential problems.

2. Basic Application Monitoring: From interrogating the Operating System, capture and alert on data about the processes running on the servers. This would include CPU & memory utilization, disk I/O, network I/O etc.

3. Advanced Application Monitoring: Installing a tailored agent on the server which is capturing data specific to the application it is monitoring. This can be "inside the app" data or "outside the app" which is useful for Off the Shelf software products and middleware.

4. Flow Monitoring: This is capturing data about the information passing between applications and monitoring/reporting on data flows. This would include volumes/second, volumes per counterparty, latency etc.

5. Business and IT Analysis: This is the analysis of both business data and "machine" data from levels 1 and 2 to understand the business activity and the behavior of the IT estate.

Monitoring vs Analytics

The second model is separating monitoring from analytics. There is no hard definition which separates them so we break the types of analysis into three types:

1. Detect: This is a rule based detection of an alert condition. This is generally what people mean when they talk about Monitoring.

2. Analyze: This is the collection of lots of data, even data which did not trigger a rule in Detect, and analyzing it to discover more insight. This may be as simple as trends, or as complex as Machine Learning and time series pattern based Anomaly Detection. This would also include techniques like Bayesian Network Causal Analysis.

3. Predict: This uses current and historic data to try and predict future or “what if” scenarios. Again, this can be as simple as extrapolation, or as complex as comparison of current state to empirically derived behavioral data, the likes of which you might have gathered in a performance lab when stress testing an application.

Whichever way you model your IT estate and the behavior of your applications, it is necessary to have a clear language so that people are talking about the same thing.

Guy Warren is CEO of ITRS Group.

Share this

The Latest

April 21, 2021

Few tools provide early detection of mission-critical mail outages. On March 15, Microsoft had a service outage worldwide that impacted its services such as Teams AV, Yammer, OneDrive, and Azure Active Directory. Users reported not being able to login into either of these services and were getting timeout messages ...

April 20, 2021

More than half (60%) of IT organizations are investing in improving employee experience to support remote workforce productivity and performance according to The Changing Role of the IT Leader study by Elastic ...

April 19, 2021

Why are CDNs becoming more important to so many businesses? And how will they handle the new applications coming out over the next few years? APMdigest sat down with Mehdi Daoudi, CEO and co-founder of Catchpoint Systems, to find out ...

April 15, 2021

A growing need for process automation as a result of the confluence of digital transformation initiatives with the remote/hybrid work policies brought on by the pandemic was uncovered by an independent survey of over 500 IT Operations, DevOps, and Site Reliability Engineering (SRE) professionals commissioned by Transposit for its inaugural State of DevOps Automation Report ...

April 14, 2021

As the Covid-19 pandemic forces a global reset of how we gather and work, 60% of organizations are looking forward to increased spending in 2021 to deploy new technologies, according to the 14th annual State of the Network global study of enterprise networking and security challenges released by VIAVI Solutions ...

April 13, 2021

Complexity breaks correlation. Intelligence brings cohesion. This simple principle is what makes real-time asset intelligence a must-have for AIOps that is meant to diffuse complexity. To further create a context for the user, it is critical to understand service dependencies and correlate alerts across the stack to resolve incidents ...

April 12, 2021

We're all familiar with the process of QA within the software development cycle. Developers build a product and send it to QA engineers, who test and bless it before pushing it into the world. After release, a different team of SREs with their own toolset then monitor for issues and bugs. Now, a new level of customer expectations for speed and reliability have pushed businesses further toward delivering rapid product iterations and innovations to keep up with customer demands. This leaves little time to run the traditional development process ...

April 08, 2021

On Wednesday January 27, 2021, Microsoft Office 365 experienced an outage affected a number of its services with a prolonged outage affecting Exchange Online. Despite Microsoft indicating that it was just Exchange Online affected during this outage, some monitoring tools detected that Azure Active Directory and dependent services like SharePoint and OneDrive were also affected at the time. The outage information indicated a rollout and rollback but we wouldn't expect to see such a widescale outage and slowdown just affecting some of the schema unless everything had to be taken offline ...

April 07, 2021

Application availability depends on the availability of other elements in a system, for example, network, server, operating system and so on, which support the application. Concentrating solely on the availability of any one block will not produce optimum availability of the application for the end user ...

April 06, 2021

A hybrid work environment will persist after the pandemic recedes, with over 80% stating that they expect over a quarter of workers to remain remote, and over two-thirds desiring flexibility between on-premises and remote deployments according to the 2021 State of the WAN report released by Aryaka ...