Event Management: Reactive, Proactive or Predictive?
August 01, 2012

Larry Dragich
Auto Club Group

Share this

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

Share this

The Latest

January 20, 2017

Traditionally, Application Performance Management (APM) is usually associated with solutions that instrument application code. There are two fundamental limitations with such associations. If instrumenting the code is what APM is all about, then APM is applicable only to homegrown applications for which access to code is available ...

January 19, 2017

The correlation between mobile app crashes and increasing churn rates (or declining user retention) has long been suspected. In the report, titled Crash and Churn, Apteligent set out to understand the impact of per user crash rate on churn ...

January 18, 2017

In Fall 2016, Paessler AG surveyed 650 system administrators from 49 countries to get a "state of the SysAdmin" and find out how their jobs are changing, how they spend their time, and what their priorities are. The survey responses led to some interesting findings – namely, that when it comes to today's SysAdmins, things are not as they seem. Here are some of the key findings that illustrate the gap between perception and reality ...

January 17, 2017

Choosing an application performance monitoring (APM) solution can be a daunting task. A quick Google search will show popular products, but there's also a long list of less-well-known open source products available, too. So how do you choose the right solution? ...

January 13, 2017

Digital transformation is a key initiative for enterprises that want to reach new customers and offer greater value via technology. Changing user expectations, new modes of engagement and the need to improve responsiveness are the main factors driving companies to update outdated processes and develop new applications as part of a digital transformation strategy. But in order to deliver on the promise of digital transformation, organizations must also modernize their IT infrastructure to support speed, scale and change ...

January 12, 2017

Digital transformation is evolving the enterprise to one in which high performance applications are now the norm as organizations use video, graphics and other information intensive multimedia to populate these new channels of engagement. Digital technologies, and high performance applications, create further pressure on IT staffs which are grappling with PCs that are past their optimum performance. As a result, IT is looking at alternatives to swapping out PCs and investing in more costly equipment that will inevitably have an expiration date. One solution is to build on virtualization solutions that incorporate high-performance thin clients ...

January 11, 2017

If your business depends on mission-critical web or legacy applications, then monitoring how your end users interact with your applications is critical. Most monitoring solutions try to infer the end-user experience based on resource utilization. However, resource utilization cannot provide meaningful results on how the end-user is experiencing an interaction with an application. The true measurement of end-user experience is availability and response time of the application, end-to-end and hop-by-hop ...

January 10, 2017

There's nothing like a major web outage to remind us how much our applications rely on other web services and technologies to function. In late October of last year, a Distributed Denial of Service (DDoS) attack on Dyn, one of the largest Domain Name Service (DNS) providers on the internet, disrupted service for consumer and business applications across the web. This attack shed light on the delicate interdependent nature of the web as productivity and uptime across the world was effected ...

January 09, 2017

As an IT professional, I'm used to words that mean different things to different people. For example, "log monitoring" could mean anything from simple text files to logfile aggregation systems. "Uptime" is also notoriously hard to nail down. Heck, even the word "monitoring" itself can be obscure. This is why I'm not surprised that application performance monitoring (APM) can mean so many different things depending on the context ...

January 06, 2017

Big data continues to be the fastest-growing segment of the information management software market. New findings released by Ovum estimate that the big data market will grow from $1.7bn in 2016 to $9.4bn by 2020, comprising 10% of the overall market for information management tooling ...