Event Management: Reactive, Proactive or Predictive?
August 01, 2012
Larry Dragich

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

The Latest

January 23, 2015

Historically the network has been considered as a separate, well-defined entity, making it relatively straightforward to write tools to understand and analyze its performance. These fall into two categories: Network Management Systems (NMS) and packet capture and analysis tools ...

January 22, 2015

There has been a significant increase in the number or companies leveraging hybrid IT— or the ability to burst to the public cloud to meet fluctuating computing and business requirements, according to ScienceLogic's Hybrid IT survey results from IT departments at more than 1,200 global organizations ...

January 21, 2015

Enterprises are not prioritizing holistic monitoring technology and are missing the opportunity for early issue identification that could prevent customer-impacting problems and, ultimately, damage to the company’s reputation, according to a study conducted by Forrester Consulting on behalf of Virtual Instruments ...

January 20, 2015

Application traffic flows have become less deterministic, and infrastructure architects can no longer rely solely on centralized appliances to provide necessary application delivery and security services. New deployment models are emerging to help enterprises with this transition, and Gartner predicts that by 2018, at least three consolidated network service offerings will emerge with feature sets that span application delivery, global traffic distribution, optimization and security functions ...

January 19, 2015

More than 70 percent of companies recognize that IT infrastructure plays an important role in enabling competitive advantage or optimizing revenue and profit. However, despite this recognition, only 22 percent have a well-defined enterprise IT infrastructure strategy, according to an IBM Institute for Business Value report, Continuing the IT Infrastructure Conversation: Why Building a Strong Foundation Requires More Than Technology.

January 16, 2015

Of those surveyed, 82 percent of CIOs admit that they are unable to meet their business’s need for immediate, always-on access to IT services, according to the Veeam Data Center Availability Report 2014 ...

January 15, 2015

Employers of more than one-third of those surveyed (38.6 percent) had suffered a major IT disruption caused by staff visiting questionable and other non-work related web sites with work-issued hardware, resulting in malware infection and other related issues, according to a survey conducted by GFI Software ...

January 14, 2015

If you think that ITSM is static and old hat, think twice. A huge number of innovations are just emerging. Some have been a long time in coming; while others are unexpected surprises — as analytics and automation are changing the ITSM game dramatically. Here are some trends that I’ve seen in 2014 that I expect will grow in importance in 2015 ...

January 13, 2015

Nearly half of IT leadership and operations personnel identify improving operational efficiency as their number one near-term internal priority, and nearly one-third say big data analytics is their top deliverable goal, according to a survey by AppDynamics ...

January 12, 2015

According to the 2014 Application Troubleshooting Survey, conducted by Stackify, 37% of respondents rely on user notifications to identify issues, and many problems take more than a half day to rectify. However, the survey also revealed that adoption of next generation unified application troubleshooting tools drastically improves response times and minimizes customer impact ...

Share this