Skip to main content

4 Tips for Dealing with All Those Event Alerts

Ariel Gordon

IT operations handles hundreds, or even thousands, of console messages day in and day out – including weekends. It’s an ongoing 24x7 battle. Data centers keep expanding and increasing in complexity, yet operations is still expected to manage the flood of event alerts pouring in.

Compounding the problem of the sheer volume of events, these alert notifications typically uses technical language that can only be understood by domain experts and come entirely without context.

So, let’s have a look at some tips that will help IT operations personnel deal with all of this by focusing on important events, while understanding their impact on delivery of business services.

1. Add meaning with enrichment rules

Turn cryptic technical messages into meaningful information with text to describe the event including severity prioritization, owner, and if known the service(s) impacted. The illustration below provides an example. This helps to clarify impact of the event alert and provides guidance about the next steps to be taken.

Image removed.

2. Apply correlation rules

Apply correlation rules to help reduce redundant events displayed on the console. Use filtering rules to remove events below a specific impact level – or events that impact less important components such as test servers. It’s also possible to use de-duplication rules to reduce noise related to the same event.

3. Apply tools that define all business service infrastructure components and their interrelationships

Then, you’ll be able to understand the links between IT events and their associated context and impact on business services.

4. Be proactive to understand the impact of changes in the IT infrastructure

It’s a truism in IT that 80 percent of problems originate from changes. Get in front of those event alerts caused by change so you understand “will an upgrade to that problematic switch port take down the customer portal, or does it only affect ordering supplies?” Ensuring safer changes can eliminate many event alerts.

Ariel Gordon is Chief Technology Officer and Co-Founder of Neebula.

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions.... 

There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

Image
Guardsquare

IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...

In just a few months, Google will again head to Washington DC and meet with the government for a two-week remedy trial to cement the fate of what happens to Chrome and its search business in the face of ongoing antitrust court case(s). Or, Google may proactively decide to make changes, putting the power in its hands to outline a suitable remedy. Regardless of the outcome, one thing is sure: there will be far more implications for AI than just a shift in Google's Search business ... 

Image
Chrome

In today's fast-paced digital world, Application Performance Monitoring (APM) is crucial for maintaining the health of an organization's digital ecosystem. However, the complexities of modern IT environments, including distributed architectures, hybrid clouds, and dynamic workloads, present significant challenges ... This blog explores the challenges of implementing application performance monitoring (APM) and offers strategies for overcoming them ...

Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...

IT infrastructure (on-premises, cloud, or hybrid) is becoming larger and more complex. IT management tools need data to drive better decision making and more process automation to complement manual intervention by IT staff. That is why smart organizations invest in the systems and strategies needed to make their IT infrastructure more resilient in the event of disruption, and why many are turning to application performance monitoring (APM) in conjunction with high availability (HA) clusters ...

In today's data-driven world, the management of databases has become increasingly complex and critical. The following are findings from Redgate's 2025 The State of the Database Landscape report ...

With the 2027 deadline for SAP S/4HANA migrations fast approaching, organizations are accelerating their transition plans ... For organizations that intend to remain on SAP ECC in the near-term, the focus has shifted to improving operational efficiencies and meeting demands for faster cycle times ...

As applications expand and systems intertwine, performance bottlenecks, quality lapses, and disjointed pipelines threaten progress. To stay ahead, leading organizations are turning to three foundational strategies: developer-first observability, API platform adoption, and sustainable test growth ...

4 Tips for Dealing with All Those Event Alerts

Ariel Gordon

IT operations handles hundreds, or even thousands, of console messages day in and day out – including weekends. It’s an ongoing 24x7 battle. Data centers keep expanding and increasing in complexity, yet operations is still expected to manage the flood of event alerts pouring in.

Compounding the problem of the sheer volume of events, these alert notifications typically uses technical language that can only be understood by domain experts and come entirely without context.

So, let’s have a look at some tips that will help IT operations personnel deal with all of this by focusing on important events, while understanding their impact on delivery of business services.

1. Add meaning with enrichment rules

Turn cryptic technical messages into meaningful information with text to describe the event including severity prioritization, owner, and if known the service(s) impacted. The illustration below provides an example. This helps to clarify impact of the event alert and provides guidance about the next steps to be taken.

Image removed.

2. Apply correlation rules

Apply correlation rules to help reduce redundant events displayed on the console. Use filtering rules to remove events below a specific impact level – or events that impact less important components such as test servers. It’s also possible to use de-duplication rules to reduce noise related to the same event.

3. Apply tools that define all business service infrastructure components and their interrelationships

Then, you’ll be able to understand the links between IT events and their associated context and impact on business services.

4. Be proactive to understand the impact of changes in the IT infrastructure

It’s a truism in IT that 80 percent of problems originate from changes. Get in front of those event alerts caused by change so you understand “will an upgrade to that problematic switch port take down the customer portal, or does it only affect ordering supplies?” Ensuring safer changes can eliminate many event alerts.

Ariel Gordon is Chief Technology Officer and Co-Founder of Neebula.

Hot Topics

The Latest

In MEAN TIME TO INSIGHT Episode 12, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses purchasing new network observability solutions.... 

There's an image problem with mobile app security. While it's critical for highly regulated industries like financial services, it is often overlooked in others. This usually comes down to development priorities, which typically fall into three categories: user experience, app performance, and app security. When dealing with finite resources such as time, shifting priorities, and team skill sets, engineering teams often have to prioritize one over the others. Usually, security is the odd man out ...

Image
Guardsquare

IT outages, caused by poor-quality software updates, are no longer rare incidents but rather frequent occurrences, directly impacting over half of US consumers. According to the 2024 Software Failure Sentiment Report from Harness, many now equate these failures to critical public health crises ...

In just a few months, Google will again head to Washington DC and meet with the government for a two-week remedy trial to cement the fate of what happens to Chrome and its search business in the face of ongoing antitrust court case(s). Or, Google may proactively decide to make changes, putting the power in its hands to outline a suitable remedy. Regardless of the outcome, one thing is sure: there will be far more implications for AI than just a shift in Google's Search business ... 

Image
Chrome

In today's fast-paced digital world, Application Performance Monitoring (APM) is crucial for maintaining the health of an organization's digital ecosystem. However, the complexities of modern IT environments, including distributed architectures, hybrid clouds, and dynamic workloads, present significant challenges ... This blog explores the challenges of implementing application performance monitoring (APM) and offers strategies for overcoming them ...

Service disruptions remain a critical concern for IT and business executives, with 88% of respondents saying they believe another major incident will occur in the next 12 months, according to a study from PagerDuty ...

IT infrastructure (on-premises, cloud, or hybrid) is becoming larger and more complex. IT management tools need data to drive better decision making and more process automation to complement manual intervention by IT staff. That is why smart organizations invest in the systems and strategies needed to make their IT infrastructure more resilient in the event of disruption, and why many are turning to application performance monitoring (APM) in conjunction with high availability (HA) clusters ...

In today's data-driven world, the management of databases has become increasingly complex and critical. The following are findings from Redgate's 2025 The State of the Database Landscape report ...

With the 2027 deadline for SAP S/4HANA migrations fast approaching, organizations are accelerating their transition plans ... For organizations that intend to remain on SAP ECC in the near-term, the focus has shifted to improving operational efficiencies and meeting demands for faster cycle times ...

As applications expand and systems intertwine, performance bottlenecks, quality lapses, and disjointed pipelines threaten progress. To stay ahead, leading organizations are turning to three foundational strategies: developer-first observability, API platform adoption, and sustainable test growth ...