Event Management: Reactive, Proactive or Predictive?
August 01, 2012
Larry Dragich

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

The Latest

February 11, 2016

Moogsoft conducted an IT monitoring survey with attendees at AppDynamics AppSphere. A variety of questions were asked, from the number of monitoring tools they use, to the time it takes to detect and resolve incidents, to what they hope to see from vendors in the future. While some responses were predictable, some were actually rather surprising ...

February 10, 2016

To optimize application performance, IT teams need to proactively ensure the optimized performance of every underlying infrastructure component, including physical and virtual servers, networks, storage devices, databases, end-user services and cloud and big data environments. It is imperative that they not only identify and resolve issues quickly, but also pre-empt potential issue before there's an adverse impact on the user experience. Below are six key requirements that can help you achieve these objectives ...

February 09, 2016

SDN is not expected to truly catch on in 2016, according to some experts on APMdigest's 2016 predictions list, however a recent International Data Corporation (IDC) forecast says: the worldwide software-defined networking (SDN) market — comprising physical network infrastructure, virtualization/control software, SDN applications (including network and security services), and professional services — will have a compound annual growth rate (CAGR) of 53.9% from 2014 to 2020 and will be worth nearly $12.5 billion in 2020 ...

February 08, 2016

Just seconds after her Super Bowl performance (and subsequent tour teaser), Beyonce found her website "slayed" by an onslaught of traffic and connection failures. This outage illustrates that even the biggest celebrities and brands can easily find their digital performance put to the test ...

February 05, 2016

As the Super Bowl approaches, an equally epic game is taking place in cyberspace. The Super Bowl is the "moment of truth" for the NFL, sports media, advertisers, restaurants (especially pizza joints), food delivery services and gambling sites. They have to be ready in order to capitalize on unpredictably "spiky" traffic and the transactions before, during and, for some, after the game. To better evaluate the impact of the Super Bowl on ecommerce and website traffic, SOASTA conducted a survey that examined the online and mobile habits and preferences of Americans watching this year's big game ...

February 04, 2016

It’s hard to define software-defined networking (SDN) as one thing, given that it is applied to so many different areas of networking: Data centers, enterprise campus, the WAN, radio access networks, etc. And each vendor that introduces an SDN product to the market is working from a definition that fits into its own strategy. But … what do those people who deploy SDN have to say? ...

February 03, 2016

Remember the adage "beauty is in the eye of the beholder?" Similarly, service quality is in the eye of the user. So, to understand service quality, we should be measuring end-user experience (EUE). You may already be measuring EUE. Some of your applications – particularly those based on Java and .NET – may already be instrumented with agent-based APM solutions. But there are a few challenges to an agent-based approach to EUE ...

February 02, 2016

IT and business executives agree that digital transformation and the use of hybrid clouds are key to competitive success in the digital age, according to a new study conducted by IDG Research Services ...

February 01, 2016

ExtraHop expects to see the network emerge as a critical nexus of business over the next twelve months, with significant integration between network and security, demand for operational support of connected devices, and the ability to mine all data-in-motion for correlated, cross-tier and cross-team insights ...

January 29, 2016

Log Analytics is a process of investigating logs and hoping to derive actionable information that might be useful to the business. Many log analytics tools are used to gain visibility into web traffic, security, application behavior, etc. But how valuable and practical is log analytics in reality? ...

Share this