Real-Time Monitoring Metrics - The Magical Mundane

September 12, 2012

Larry Dragich

Application Performance Management (APM) has many benefits when implemented with the right support structure and sponsorship. It's the key for managing action, going red to green, and trending on performance.

As you strive to achieve new levels of sophistication when creating performance baselines, it is important to consider how you will navigate the oscillating winds of application behavior as the numbers come in from all directions. The behavioral context of the user will highlight key threshold settings to consider as you build a framework for real-time alerting into your APM solution.

This will take an understanding of the application and an analysis of the numbers as you begin looking at user patterns. Metrics play a key role in providing this value through different views across multiple comparisons. Absent from any behavioral learning engines which are now emerging in the APM space, you can begin a high level analysis on your own to come to a common understanding of each business application's performance.

Just as water seeks its own level, an application performance baseline will eventually emerge as you track the real-time performance metrics outlining the high and low watermarks of the application. This will include the occasional anomalous wave that comes crashing through affecting the user experience as the numbers fluctuate.

Depending on transaction volume and performance characteristics there will be a certain level of noise that you will need to squelch to a volume level that can be analyzed. When crunching the numbers and distilling patterns, it will be essential to create three baseline comparisons that you will use like a compass for navigation into what is real and what is an exception.

Real-Time vs. Yesterday

As the real-time performance metrics come in, it is important to watch the application performance at least at the five minute interval as compared to the day before to see if there are any obvious changes in performance.

Real-Time vs. 7 days Ago

Comparing Monday to Sunday may not be relevant if your core business hours are M-F; using the real-time view and comparing it to the same day as the previous week will be more useful - especially if a new release of the application was rolled out over the weekend and you want to know how it compares with the previous week.

Real-Time vs. 10 Day Rolling Average

Using a 10, 15 or 30 day rolling average is helpful in reviewing overall application performance with the business, because everyone can easily understand averages and what they mean when compared against a real-time view.

Capturing real-time performance metrics in five minute intervals is a good place to start. Once you get a better understanding of the application behavior you may increase or decrease the interval as needed. For real-time performance alerting, using the averages will give you a good picture when something is out of pattern, and to report on Service Level Management using percentiles (90%, 95%, etc.), will help create and accurate view for the business. To make it simple to remember, alert on the averages and profile with percentiles.

Conclusion

Operationally there are things you may not want to think about all of the time (e.g. standard deviations, averages, percentiles, etc.), but you have to think about them long enough to create the most accurate picture possible as you begin to distill performance patterns with each business application. This can be accomplished by building meaningful performance baselines that will help feed your Service Level Management processes well into the future.

You can contact Larry on LinkedIn.

Hot Topics

APM

Alerting

Analytics

The Latest

Your Observability Stack Has a Telemetry Pipeline Problem

May 18, 2026

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

Operator to Orchestrator: 80% of IT Pros See Shift in Role as AI Permeates Workflows

May 15, 2026

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

Gartner: 40% of Organizations Deploying AI Will Use AI Observability to Monitor Model Performance by 2028

May 15, 2026

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Almost Half of AI-Generated Code Fails in Production

May 15, 2026

Until AI-powered engineering tools have live visibility of how code behaves at runtime, they cannot be trusted to autonomously ensure reliable systems, according to the State of AI-Powered Engineering Report 2026 report from Lightrun. The report reveals that a major volume of manual work is required when AI-generated code is deployed: 43% of AI-generated code requires manual debugging in production, even after passing QA or staging tests. Furthermore, an average of three manual redeploy cycles are required to verify a single AI-suggested code fix in production ...

AI Requires Design, Not Edicts

May 14, 2026

Many organizations describe AI as strategic, but they do not manage it strategically. When AI plans are disconnected from strategy, detached from organizational learning, and protected from serious assumptions testing, the problem is no longer technical immaturity; it is a failure of management discipline ... Executives too often tell organizations to "use AI" before they define what AI is supposed to change. The problem deepens in organizations where strategy isn't well articulated in the first place ...

How Intelligent Orchestration Enables Enterprises to Move from AI POC to AI Production

May 13, 2026

Across the enterprise technology landscape, a quiet crisis is playing out. Organizations have run hundreds, sometimes thousands, of generative AI pilots. Leadership has celebrated the proof of concept (POCs) ... Industry experience points to a sobering reality: only 5-10% of AI POCs that progress to the pilot stage successfully reach scaled production. The remaining 90% fail because the enterprise environment around them was never ready to absorb them, not the AI models ...

Reliability Is the New Bottleneck of Innovation

May 12, 2026

Today's modern systems are not what they once were. Organizations now rely on distributed systems, event-driven workflows, hybrid and multi-cloud environments and continuous delivery pipelines. While each adds flexibility, it also introduces new, often invisible failures. Development speed is no longer the primary bottleneck of innovation. Reliability is ...

5 Takeaways from the Observability Forecast for Retail and eCommerce

May 11, 2026

Seeing is believing, or in this case, seeing is understanding, according to New Relic's 2025 Observability Forecast for Retail and eCommerce report. Retailers who want to provide exceptional customer experiences while improving IT operations efficiency are leaning on observability ... Here are five key takeaways from the report ...

Maximizing Impact Amid Constraints: The Role of Automation and Orchestration in Federal IT Modernization

May 08, 2026

Technology leaders across the federal landscape are facing, and will continue to face, an uphill battle when it comes to fortifying their digital environments against hostile and persistent threat actors. On one hand, they are being asked to push digital transformation ... On the other hand, they are facing the fiscal uncertainty of continuing resolutions (CR) and government shutdowns looming near and far. In the face of these challenges, CIOs, CTOs, and CISOs must figure out how to modernize legacy systems and infrastructure while doing more with less and still defending against external and internal threats ...

The SRE Report 2026: Reliability Is Being Redefined

May 07, 2026

Reliability is no longer proven by uptime alone, according to the The SRE Report 2026 from LogicMonitor. In the AI era, it is experienced through speed, consistency, and user trust, and increasingly judged by business impact. As digital services grow more complex and AI systems move into production, traditional monitoring approaches are struggling to keep pace, increasing the need for AI-first observability that spans applications, infrastructure, and the Internet ...

Real-Time Monitoring Metrics - The Magical Mundane

September 12, 2012

Larry Dragich

Real-Time vs. Yesterday

Real-Time vs. 7 days Ago

Real-Time vs. 10 Day Rolling Average

Conclusion

You can contact Larry on LinkedIn.

Hot Topics

APM

Alerting

Analytics

The Latest

Your Observability Stack Has a Telemetry Pipeline Problem

May 18, 2026

Operator to Orchestrator: 80% of IT Pros See Shift in Role as AI Permeates Workflows

May 15, 2026

80% of respondents agree that the IT role is shifting from operators to orchestrators, according to the 2026 IT Trends Report: The Human Side of Autonomous IT from SolarWinds ...

Gartner: 40% of Organizations Deploying AI Will Use AI Observability to Monitor Model Performance by 2028

May 15, 2026

40% of organizations deploying AI will implement dedicated AI observability tools by 2028 to monitor model performance, bias and outputs, according to Gartner ...

Almost Half of AI-Generated Code Fails in Production

May 15, 2026

AI Requires Design, Not Edicts

May 14, 2026

How Intelligent Orchestration Enables Enterprises to Move from AI POC to AI Production

May 13, 2026

Reliability Is the New Bottleneck of Innovation

May 12, 2026

5 Takeaways from the Observability Forecast for Retail and eCommerce

May 11, 2026

Maximizing Impact Amid Constraints: The Role of Automation and Orchestration in Federal IT Modernization

May 08, 2026

The SRE Report 2026: Reliability Is Being Redefined

May 07, 2026

Featured Webinar

Featured Webinar

Featured White Paper

Featured Free Tool

Featured eBook

Featured White Paper

Featured White Paper

Featured Webinar

Featured Webinar

Featured Report

Featured Free Trial

Featured Webinar

Featured eBook

Featured Webinar

Featured Webinar

Featured Webinar

Featured Report

Featured Webinar

Featured White Paper

Featured Webinar

Featured Report

Featured Free Trial

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured Report

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Webinar

Featured White Paper

Featured eBook

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured White Paper

Featured eBook

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Webinar

Featured Report

Featured Webinar