Universal Monitoring Crimes and What to Do About Them - Part 2
May 23, 2018

Leon Adato
SolarWinds

Share this

To help your organization increase data center efficiency and get the most benefit out of your monitoring solutions, here are the remaining universal monitoring crimes and what you can do about them:

Start with Universal Monitoring Crimes and What to Do About Them - Part 1

4. Flapping or sawtoothing alerts

When an alert repeatedly triggers (a device that keeps rebooting itself or processes keep deleting/creating temporary page files so that one moment it's over threshold, the next it's below, for example), that condition is known as flapping or sawtoothing.

What to do about it: These types of alerts have several possible resolutions based on what is supported by your monitoring solution and which best fits the specific situation:

■ GOOD: Suppress events within a window. Ignoring duplicated events within a certain period of time is often all you need to avoid meaningless duplicates.

■ ALSO GOOD: As mentioned previously, add a time delay to allow for self-resolution, avoid false-positives, and eliminate other potential issues that don't necessarily require a remediation response.

■ BETTER: Leverage "Reset" logic. Wait for a reset event before triggering a new alert of the same kind. Avoid making the reset logic merely the reverse of the trigger (if the alert is > 90%, the reset might be 90% for 15 minutes, but it won't reset until it's

■ BEST: Two-way communication with a ticket or alert management system. This is where the monitoring system communicates with the ticket and/or alert tracking system, so you can never cut the same alert for the same device until a human has actively corrected the original problem and closed the ticket.

5. No lab, test, or QA environments for your monitoring system

If your monitoring system is watching and alerting on mission-critical systems within the enterprise, then it is mission critical itself. But despite the fact that many organizations set up a proof-of-concept environment when evaluating monitoring solutions, once the production system is selected and rolled out, they fail to have any type of lab, test, or QA system that is maintained on an ongoing basis to help ensure the system is maintained.

What to do about it: Duh. Implement test, dev, and/or QA installations that serve to ensure your monitoring system has the oversight necessary for a mission-critical application.

■ TEST: An (often temporary) environment where patches and upgrades can be tested before attempting them in production.

■ DEV: An environment that mirrors production in terms of software, but where monitors for new equipment, applications, reports, or alerts can be set up and tested before rolling those solutions to production. And as mentioned earlier, this is the perfect place to also monitor your production monitoring environment.

■ QA: An environment that mirrors the previous version of production, so that if issues are found in production, they can be double-checked to confirm whether the problem was introduced in the last revision.

Note that I'm not implying you necessarily must have all three, but it's worth considering the value of at least one. Because "none" is a really bad choice.

Final thoughts

The rate of technical change in the data center today is rapidly accelerating and traditional data center systems have undergone considerable evolution in a very short period of time. As complexity continues to grow alongside the expectation that an organization's IT department should become ever-more "agile" and continue to deliver a quality end-user experience 24/7 (meaning no glitches, outages, application performance problems, etc.), it's important that IT professionals give monitoring the priority it deserves as a foundational IT discipline.

By understanding and addressing these top universal monitoring crimes, you can ensure your organization receives the benefit of sophisticated, tuned monitoring systems while also enabling a more proactive data center strategy now and in the future.

Leon Adato is a Head Geek at SolarWinds
Share this

The Latest

June 21, 2018

There’s no doubt that digital innovations are transforming industries, and business leaders are left with little or no choice – either embrace digital processes or suffer the consequences and get left behind ...

June 20, 2018

Looking ahead to the rest of 2018 and beyond, it seems like many of the trends that shaped 2017 are set to continue, with the key difference being in how they evolve and shift as they become mainstream. Five key factors defining the progression of the digital transformation movement are ...

June 19, 2018

Companies using cloud technologies to automate their legacy applications and IT operations processes are gaining a significant competitive advantage over those behind the curve, according to a new report from Capgemini and Sogeti, The automation advantage: Making legacy IT keep pace with the cloud ...

June 18, 2018

It's every system administrator's worse nightmare. An attempt to restore a database results in empty files, and there is no way to get the data back, ever. Here are five simple tips for keeping things running smoothly and minimizing risk ...

June 15, 2018

When it comes to their own companies, 50% of IT stakeholders think they are leaders and will disrupt, while 50% feel they are behind and will be disrupted by the competition in 2018, according to a new survey of IT stakeholders from Alfresco Software and Dimensional Research. The report, Digital Disruption: Disrupt or Be Disrupted, is a wake-up call for the C-suite ...

June 14, 2018

If you are like most IT professionals, which I am sure you are, you are dealing with a lot issues. Typical issues include ...

June 13, 2018

The importance of artificial intelligence and machine learning for customer insight, product support, operational efficiency, and capacity planning are well-established, however, the benefits of monitoring data in those use cases is still evolving. Three main factors obscuring the benefits of data monitoring are the infinite volume of data, its diversity, and inconsistency ...

June 11, 2018

Imagine this: after a fantastic night's sleep, you walk into the office ready to attack the day. You sit down at your desk ready to go, and your computer starts acting up. You call the help desk, but all IT can do is create a ticket for you and transfer it to another team to help you as soon as possible ...

June 08, 2018

As many IT workers develop greater technology skills and apply them to advance their careers, many digital workers in non-IT departments believe their CIO is out of touch with their technology needs. A Gartner, Inc. survey found that less than 50 percent of workers (both IT and non-IT) believe their CIOs are aware of digital technology problems that affect them ...

June 07, 2018

CIOs of 73% of organizations say the need for speed in digital innovation is putting customer experience at risk, according to an independent global survey of 800 CIOs commissioned by Dynatrace ...