Event Management: Reactive, Proactive or Predictive?
August 01, 2012

Larry Dragich
Auto Club Group

Share this

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

Share this

The Latest

December 02, 2016

There is an increasing recognition of the interconnected nature of the information technology environment. Also, user expectations and IT complexity are rising. As a result, IT infrastructure performance management (IPM) is becoming more popular. Companies practicing IPM are realizing the benefits it delivers to the bottom line. They include the ability to ...

December 01, 2016

In my last blog, I expressed my opinion that IT operations teams may be about to enjoy a renaissance rather than dismally fading away — but only if they adopt new ways of working, measuring themselves and interacting with business stakeholders. In this blog, I'd like to discuss how technology investments can help smooth the way toward operational transformation with a few examples from recent interviews. More specifically, I'd like to focus on three key areas of innovation, all in some way related to Advanced IT Analytics ...

November 30, 2016

Almost one-third (28 percent) of customers will not return to a slow site, according to SOASTA's 2016 Holiday Retail Insights Report ...

November 29, 2016

Black Friday. Retailers know it's coming every year, and still – every year – someone has a spectacular failure. This year Macy's gets top billing – asking customers to wait to shop. Since 500 milliseconds of web delay is estimated to cost 5% of revenue, how much can we guess Macy's lost by asking EVERY shopper, for hours, to wait to shop? It's clearly in the millions of dollars ...

November 28, 2016

The most destructive root cause of 75 percent of outages during big online events like Black Friday and Cyber Monday are unplanned configuration changes to a system – when IT and Ops teams find something they think might cause a problem and try to fix it immediately, unintentionally creating a much bigger issue for the web or mobile site. The following are BigPanda's top recommendations for preventing outages during throughout the entire holiday shopping season ...

November 22, 2016

It's safe to say that the role of IT Operations is changing, but beyond that there are countless opinions about just why and how. Lately I've been hearing a growing number of doomsday prophecies about how operations professionals are going away as they shrink in importance to managing an infrastructure already being replaced by cloud. However, I see a strong and consistent trend that isn't a move away from operations, but rather a deliberate transformation of how IT operations teams work. So which vision is correct? Gloom and doom or new levels of empowerment and rebirth? ...

November 21, 2016

Over the past few years, IT service management (ITSM) has become increasingly important to an organization's IT strategy, and companies are seeking new ways to improve IT service delivery and efficiency via better ITSM processes. Using advanced IT analytics, managers can identify blind spots and hidden gaps in their ITSM process as well as make accurate decisions by monitoring key metrics. Here is how advanced IT analytics can make the best of your IT service desk ...

November 18, 2016

The IoT is in position to become one of the greatest application performance management challenges faced by IT. APMdigest asked experts across the industry for their recommendations on how to ensure performance for IoT applications. Part 4, the final installment of the list, covering communication and the network ...

November 17, 2016

The IoT is in position to become one of the greatest application performance management challenges faced by IT. APMdigest asked experts across the industry for their recommendations on how to ensure performance for IoT applications. Part 3 covers app design and development ...

November 16, 2016

The IoT is in position to become one of the greatest application performance management challenges faced by IT. APMdigest asked experts across the industry – including analysts, consultants and vendors – for their recommendations on how to ensure performance for IoT applications. Part 2 covers data and analytics ...