Event Management: Reactive, Proactive or Predictive?
August 01, 2012

Larry Dragich
Auto Club Group

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

The Latest

April 29, 2016

A majority (80 percent) of organizations receiving 500 or more severe/critical alerts per day currently investigate less than one percent of those alerts, according to new research from Enterprise Management Associates (EMA), sponsored by Savvius ...

April 28, 2016

Ipswitch recently released a report, The Challenges of Controlling IT Complexity, that reveals IT teams feel they are at risk of losing control of their company’s IT environment in the face of new technologies. But what exactly is it about new technologies that is vexing today’s IT teams? A deeper dive into the research uncovers two major themes that teams are grappling with to better manage increasing IT complexity ...

April 27, 2016

The findings outlined in Part 1 of this blog point to a need for "smart" APM solutions supporting automation of change monitoring, performance and availability management, and production troubleshooting functions. With such capabilities in place, Dev and Ops resources could be freed up to deliver the new software products that have become the lifeblood of the agile business ...

April 26, 2016

At a time when software is becoming increasingly business relevant, IT teams are, in too many cases, retreating to the silo monitoring techniques of the past to track and troubleshoot application performance ...

April 25, 2016

DevOps is hot. This sizzling buzzword is on the tip of every tongue in the IT world, from Development, Testing and QA through IT Operations. At DEVOPSdigest, we have talked a lot about what DevOps is, and how you get there – but what's the point? Why go through all this trouble? What advantages can be gained from adopting a DevOps strategy? To explore the answers to these questions, DEVOPSdigest asked experts from across the industry – including consultants, analysts and the leading vendors – for their opinions on the most significant advantages of DevOps ...

April 22, 2016

Here are some common recommendations to optimize the steps of a web page request ...

April 21, 2016

The performance of your website is obviously very important. When visitors comes to your company website, they won't stick around very long if it's slow. If those visitors are users of your web application, they may not be for long if they encounter a consistently slow performing app. So we want to make our websites and web applications fast. But how can we go about doing that? ...

April 20, 2016

CEOs have underlined that growth will be their top business priority for 2016, according to a recent survey by Gartner, Inc. The 2016 Gartner CEO and senior business executive survey found that despite indications that the global economy is struggling in early 2016, CEOs do not plan to significantly change their priorities. After growth (54 percent), the second and third business priorities are customers (31 percent) and workforce (27 percent) ...

April 19, 2016

A bad onboarding experience can be a nightmare for a new user. Bad website design, absent customer support, a poorly implemented tutorial, all of this can turn into a Kafkaesque nightmare for the unsuspecting customer. Here are four of the most common customer onboarding pitfalls in the SaaS world and the equivalent nightmares we've all experienced at one time or another ...

April 18, 2016

The digital business era is placing a premium on strong end-user performance (speed) for all websites, mobile sites and applications. Failing to deliver strong experiences can negatively impact a company's profits and brand reputation. Staying ahead of the game from a performance perspective really comes down to preparation and monitoring. If a comprehensive performance strategy is deployed, organizations are less likely to fall behind. Today, a failsafe performance management strategy consists of these six key factors ...

Share this