Event Management: Reactive, Proactive or Predictive?
August 01, 2012

Larry Dragich
Auto Club Group

Share this

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

Share this

The Latest

February 24, 2017

Global revenue in the BI and analytics software market is forecast to reach $18.3 billion in 2017, an increase of 7.3 percent from 2016, according to the latest Gartner forecast. Gartner believes the rapidly evolving modern BI and analytics market is being influenced by the following 7 dynamics ...

February 23, 2017

An important aspect of performance monitoring is where the observer stands when looking at the IT scenario. Each participant has a different view of what is bad performance - network, database, web, system, user personnel, management and external people - customers, regulatory bodies etc. These are what I call viewpoints ...

February 22, 2017

An important aspect of performance monitoring is where the observer stands when looking at the IT scenario. If a complaint says the performance of an application is dreadful, the network man might say "Everything is fine" and the database man may agree, both saying "What's the problem?" All these people may say that the performance world is rosy but not to other people who have a different idea on what is rosy and what is not ...

February 21, 2017

Instapaper, a "read later" tool for saving web pages to read on other devices or offline, suffered an extensive outage 2 weeks ago. While Instapaper hit a unique problem — a file size limitation — its experience speaks to a much larger problem: scaling a database is difficult, and never quick. That basic fact explains why outages like this are surprisingly common ...

February 16, 2017

Hybrid Cloud is the preferred enterprise strategy, according to RightScale's 2017 State of the Cloud Report ...

February 15, 2017

IT departments often try to protect against downtime by focusing on the web application. Monitoring web application's performance helps identify malfunctions and their cause on a code level, so that the DevOps team can solve the problem. But, monitoring application performance only protects against application errors and ignores external factors such as network traffic, hardware, connectivity issues or bandwidth usage, all of which can have an impact performance and availability of a website ...

February 14, 2017

Everybody loves DevOps. In fact, DevOps is the hottest date in IT. That's because DevOps promises to satisfy the deepest longings of digital business — including fast execution on innovative ideas, competitively differentiated customer experiences, and significantly improved operational efficiencies ...

February 13, 2017

Forrester forecasted that direct online sales totaled 11.6 percent of total US retail sales in 2016, but digital touchpoints actually impacted an estimated 49 percent of total US retail sales, according to The State of Retailing Online 2017: Key Metrics, Business Objectives and Mobile report, released by the National Retail Federation’s Shop.org division and Forrester ...

February 10, 2017

Cisco's acquisition of AppDynamics – and the premium it paid – represents a "statement acquisition" that addresses several converging trends in both technology and financial markets. For strategic acquirers and tech investors, the acquisition is about delivering value to users and improving business outcomes through a go-to-market model that drives recurring revenues ...

February 08, 2017

Industrial and technological revolutions happen because new manufacturing systems or technologies make life easier, less expensive, more convenient, or more efficient. It's been that way in every epoch – but Continuity Software's new study indicates that in the cloud era, there's still work to be done ...