Event Management: Reactive, Proactive or Predictive?
August 01, 2012

Larry Dragich
Auto Club Group

Share this

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

Share this

The Latest

July 29, 2016

System Administrators (SysAdmins) are every organization’s unsung heroes. Whether it’s diagnosing performance bottlenecks or preventing data loss, application downtime, service outages and more, SysAdmins keep data centers running and work tirelessly behind the scenes to keep all the technology modern businesses have come to rely so heavily upon running smoothly. SolarWinds has developed these e-cards as a fun way to say “thank you” ...

July 28, 2016

IT has access to an amazing amount of data. Often we collect hundreds of data points on one server such as individual processor load, thread state, disk throughput both in and out etc. We then store this in a bin and use this to create a metric called something similar to server performance ...

July 27, 2016

Today's IT managers and engineers have an incredible arsenal of powerful tactical tools; APM, NPM, BSM, EUEM and the list goes on. The strength of these tools, their narrow, bottom-up focus, is also the cause of a real problem for businesses. These narrow tools miss issues that stem from the hand-off from one node or application to the next. The monitoring tools can't see the data falling into the gaps ...

July 25, 2016

After many science fiction plots and decades of research, Artificial Intelligence (AI) is being applied across industries for a wide variety of purposes. AI, Big Data and human domain knowledge are converging to create possibilities formerly only dreamed of. The time is ripe for IT operations to incorporate AI into its processes ...

July 22, 2016

More than $1 trillion in IT spending will be directly or indirectly affected by the shift to cloud during the next five years, according to Gartner, Inc. This will make cloud computing one of the most disruptive forces of IT spending since the early days of the digital age ...

July 21, 2016

One of the most common problems network monitoring tools are employed to solve are problems with bandwidth. Availability is critical for IT departments of all sizes, and slow bandwidth creates productivity problems and even outages that have a real effect on businesses. Identifying the problems behind bandwidth drains can be difficult, so to help, I’ve put together a list of the five most common causes of sudden traffic spikes ...

July 20, 2016

In 2014 Gartner predicted that "75 percent of IT organizations will be bi-modal in some way by 2017." We are in the midst of this two-speed IT approach that organizations are adopting at an increasing rate to stay relevant for their customers. Speed 1 is the traditional IT that is being managed by the IT Operations persona and Speed 2 is the agile IT where within the organization especially the Developer persona and the Line of Business Persona are involved to get the most out of the digital innovations that flood our daily lives. One thing that these personas have in common is that they have a need for monitoring. In this blog I will focus on the needs of the various personas ...

July 19, 2016

While shoppers enjoy the bargains on Prime Day – a 24-hour sale on Amazon – few may realize that the success of such massive events hinges on network and application performance ...

July 18, 2016

I am constantly hearing the common theme that organizations want to make their IT-dependent employees and customers top priority in order to better support business growth. However, what I then find contradictory is while the desire is there, it's a significant challenge for organizations to actually achieve this. Here are four common barriers to business transformation initiatives – and suggested steps enterprises can take to overcome them ...

July 15, 2016

You already see the potential of adopting an Internet of Things model into your enterprise, but are you doing it in the best way? The following are four questions you and your team should be answering to determine how to find the right opportunity in the IoT space for your business ...