Event Management: Reactive, Proactive or Predictive?
August 01, 2012

Larry Dragich
Auto Club Group

Share this

Can event management help foster a curiosity for innovative possibilities to make application performance better? Blue-sky thinkers may not want to deal with the myriad of details on how to manage the events being generated operationally, but could learn something from this exercise.

Consider the major system failures in your organization over the last 12 to 18 months. What if you had a system or process in place to capture those failures and mitigate them from a proactive standpoint preventing them from reoccurring? How much better off would you be if you could avoid the proverbial “Groundhog Day” with system outages? The argument that system monitoring is just a nice to have, and not really a core requirement for operational readiness, dissipates quickly when a critical application goes down with no warning.

Starting with the Event management and Incident management processes may seem like a reactive approach when implementing an Application Performance Management (APM) solution, but is it really? If “Rome is burning”, wouldn’t the most prudent action be to extinguish the fire, then come up with a proactive approach for prevention? Managing the operational noise can calm the environment allowing you to focus on APM strategy more effectively.

Asking the right questions during a post-mortem review will help generate dialog, outlining options for alerting and prevention. This will direct your thinking towards a new horizon of continual improvement that will help galvanize proactive monitoring as an operational requirement.

Here are three questions that build on each other as you work to mature your solution:

1. Did we alert on it when it went down, or did the user community call us?

2. Can we get a proactive alert on it before it goes down, (e.g. dual power supply failure in server)?

3. Can we trend on the event creating a predictive alert before it is escalated, (e.g. disk space utilization to trigger a minor@90%, major@95%, critical@98%)?

The preceding questions are directly related to the following categories respectively: Reactive, Proactive, and Predictive.

Reactive – Alerts that Occur at Failure

Multiple events can occur before a system failure; eventually an alert will come in notifying you that an application is down. This will come from either the users calling the Service Desk to report an issue or it will be system generated corresponding with an application failure.

Proactive – Alerts that Occur Before Failure

These alerts will most likely come from proactive monitoring to tell you there are component failures that need attention but have not yet affected overall application availability, (e.g. dual power supply failure in server).

Predictive – Alerts that Trend on a Possible Failure

These alerts are usually set up in parallel with trending reports that will help predict subtle changes in the environment, (e.g. trending on memory usage or disk utilization before running out of resources).


Conclusion

Once you build awareness in the organization that you have a bird’s eye view of the technical landscape and have the ability to monitor the ecosystem of each application (as an ecologist), people become more meticulous when introducing new elements into the environment. They know that you are watching, taking samples, and trending on the overall health and stability leaving you free to focus on the strategic side of APM without distraction.

ABOUT Larry Dragich

Larry Dragich, a regular blogger and contributor on APMdigest, has 23 years of IT experience, and has been in an IT leadership role at the Auto Club Group (ACG) for the past ten years. He serves as Director of Enterprise Application Services (EAS) at the Auto Club Group with overall accountability to optimize the capability of the IT infrastructure to deliver high availability and optimal performance. Dragich is actively involved with industry leaders sharing knowledge of APM technologies from best practices, technical workflows, to resource allocation and approaches for implementation of APM Strategies.

You can contact Larry on LinkedIn

Related Links:

For a high-level view of a much broader technology space refer to the slide show on BrightTALK.com which describes the “The Anatomy of APM - webcast” in more context.

For more information on the critical success factors in APM adoption and how this centers around the End-User-Experience (EUE), read The Anatomy of APM and the corresponding blog APM’s DNA – Event to Incident Flow.

Prioritizing Gartner's APM Model

APM and MoM – Symbiotic Solution Sets

Share this

The Latest

September 23, 2016

Whether your team is called the Service Desk, the Help Desk, or Level 1 Support, you're the first line of defense in ensuring IT supports the business. Here are seven ways that an end user experience monitoring solution enables Service Desk teams to deliver excellent end user experience ...

September 22, 2016

Network performance monitoring (NPM) has been around a long time. Unlike APM, NPM is still in the process of catching up to cloud realities. In May of this year, Gartner published a research note entitled Network Performance Monitoring Tools Leave Gaps in Cloud Monitoring. It's a fairly biting critique of the NPM space that says, essentially, that the vast majority of current NPM approaches were largely built for a pre-cloud era. As a result, network managers are left in the lurch when trying to adapt to the realities of digital operations ...

September 21, 2016

While the layers of abstraction created in virtualized environments afford numerous advantages, they can also obscure how the virtual resources are best allocated and how physical resources are performing. This can make maintaining optimal application performance a never-ending exercise in trial-and-error. This post highlights some of the challenges encountered when using traditional monitoring and analytics tools, and describes how machine learning, as a next-generation analytics platform, provides a better way to meet SLAs by finding and fixing issues before they become performance problems ...

September 20, 2016

New surveys by SolarWinds demonstrate the mounting responsibility being placed on the modern IT professional. With the second annual IT Professionals Day upon us, these survey results are particularly timely as they emphasize the need for greater appreciation towards you, the IT professionals of the world, and the critical role you play not only in modern business, but in the lives of nearly all technology end users ...

September 16, 2016

The worldwide public cloud services market is projected to grow 17.2 percent in 2016 to total $208.6 billion, up from $178 billion in 2015, according to Gartner. IT modernization is currently the top driver of public cloud adoption, followed by cost savings, innovation, agility and other benefits ...

September 15, 2016

A recent survey sponsored by Unisys Corporation shows a strong commitment among executives to adopting a digital business model, with the cloud as the key enabler ...

September 14, 2016

There comes a time when the vendors that serve every subset of the IT industry need to forgo self-interest and put aside competitive drivers to do whatever it takes to advance the cause of the user. Thankfully, such an effort to bring together providers of critical technology to benefit customer implementation has already emerged ...

September 13, 2016

On the first Sunday of the NFL season, ESPN's fantasy football app crashed. We see these types of stories often during so-called "surge" events, like when Black Friday takes down a retailer. Why? Often, it's the database that's been swamped in the process ...

September 09, 2016

Today’s native digital generations prefer to work on digital channels versus in-person channels. This ongoing trend has given rise to improvements in customer service, where interactions are delivered across multiple digital channels, ranging from social channels like Twitter and Facebook to text and voice communications. However, there is still more work to be done to unify these platforms more seamlessly ...

September 08, 2016

SSL certificates confirm that a web page is equipped with secured data exchange. Site visitors can therefore see at first glance whether they are on the site of a trustworthy provider. In addition, SSL certificates also increase the findability of a page on Google and operators benefit from an improved SEO ranking. Following this principle, this is how SSL certificates work ...