Manage the Performance of Virtual Environments Using Dynamic Alerts
June 16, 2014

Karthik Ramachandran
SolarWinds

Share this

As we know, virtual environments consist of many moving pieces and are generally complex to setup. Typically, IT environments, depending on the size of the organization, can have several hundred VMs down to a handful of VMs. For such virtual infrastructure deployments, it helps to monitor the performance of VM and VM usage. It's also equally important to ensure the health of your virtual appliances are always in check and to immediately know when something goes wrong.

What you really don't want is to have alerts paging you 24/7, especially when they're not critical situations. Alert management can be a subtle, but dangerous activity. Additionally, manually setting alert thresholds can be an extremely time consuming task. Alternatively, using static thresholds that don't reflect real performance problems often result in alert storms, where administrators stop watching alerts carefully. This is where the "dangerous" part comes in and often true critical alerts can be lost in the noise and missed. As a result, intelligent, dynamic alerting can be critical for both staff efficiency and system reliability.

False Alerts: Reasons Why You Get Them and How to Avoid Them

Here are a few examples why your virtual environment may trigger alerts more frequently than normal:

■ Events that frequently occur, such as resource consumption can trigger alerts more often than most other virtual components.

■ You can get "spam" alerts from VMs or hosts that are no longer in use or that have been discharged.

■ Not properly tuning threshold levels can lead to a sudden spike in alerts.

Having intelligent alerting processes help ensure irrelevant alerts are not generated. This gives virtual admins time to look at "real" alerts and fix them. Here's what you can do to avoid alerting errors:

■ Set up alerts for specific VMs that you think are really going to impact your users or your business.

■ Leverage dynamic thresholds based on historical baseline trends whenever possible to set more realistic thresholds for your clusters, hosts, VMs, and datastore.

■ Establish well-defined threshold settings—this way you can optimize the kind of alerts you receive during the day and ensure that you're not bothered after work hours.

■ Set the right dependencies to significantly lower the amount of alerts you receive.

■ Forward specific alerts to the defined teams, since they understand the severity of the alert and can fix it right away.

Determine What to Monitor and Why

Most admins have to monitor hundreds of virtual appliances, which means you're probably dealing with plenty of alerts. Under these circumstances you'll have to determine a few things:

■ Go over each host to see if all VMs under the host must be monitored or if only a few critical VMs need to be monitored for alerts.

■ Talk to your business groups or users and understand what the impact will be. This will give you a sense of how many VMs and datastores have to be setup for alerts. They may have mission critical applications running inside them, which may affect business performance.

Statistical Thresholds: A Better Way to Set Baseline Values for your Virtual Environment

Normally, you would have to monitor the performance of hosts, VMs, and datastores for several weeks in order to know what the ideal or optimum baseline is to set warning and critical thresholds. However, integrated virtualization management tools can automatically calculate performance of clusters, hosts, VMs, and datastores and determine the baseline values.

IStatistical thresholds allow you to look at the following processes:

■ Applying thresholds to clusters, hosts, VMs, and datastores.

■ Understanding baseline statistics using standard deviation calculation for day and night system performance.

■ Gaining statistical insights into performance metrics and how they vary over time. Look at how stats are collected for higher and lower threshold values for individual VMs and hosts.

■ Calculating thresholds from historical performance data saves time in adjusting thresholds and provides more intelligent alerts.

■ Setting the right threshold values using the built-in baseline calculator. This calculates and applies the recommended threshold values for warning and critical stages for clusters, hosts, VMs, and datastores.

While this won't completely eliminate "spam" alerts, it will quickly let you get to a much smaller set for the administrator to deal with. In turn, it will let them spend more time and attention on striking that balance between monitoring your VM usage and hypervisor performance, and setting the right threshold values.

Karthik Ramachandran is Product Marketing Specialist at SolarWinds.

Share this

The Latest

March 04, 2024

This year's Super Bowl drew in viewership of nearly 124 million viewers and made history as the most-watched live broadcast event since the 1969 moon landing. To support this spike in viewership, streaming companies like YouTube TV, Hulu and Paramount+ began preparing their IT infrastructure months in advance to ensure an exceptional viewer experience without outages or major interruptions. New Relic conducted a survey to understand the importance of a seamless viewing experience and the impact of outages during major streaming events such as the Super Bowl ...

March 01, 2024

As organizations continue to navigate the complexities of the digital era, which has been marked by exponential advancements in AI and technology, the strategic deployment of modern, practical applications has become indispensable for sustaining competitive advantage and realizing business goals. The Info-Tech Research Group report, Applications Priorities 2024, explores the following five initiatives for emerging and leading-edge technologies and practices that can enable IT and applications leaders to optimize their application portfolio and improve on capabilities needed to meet the ambitions of their organizations ...

February 29, 2024

Despite the growth in popularity of artificial intelligence (AI) and ML across a number of industries, there is still a huge amount of unrealized potential, with many businesses playing catch-up and still planning how ML solutions can best facilitate processes. Further progression could be limited without investment in specialized technical teams to drive development and integration ...

February 28, 2024

With over 200 streaming services to choose from, including multiple platforms featuring similar types of entertainment, users have little incentive to remain loyal to any given platform if it exhibits performance issues. Big names in streaming like Hulu, Amazon Prime and HBO Max invest thousands of hours into engineering observability and closed-loop monitoring to combat infrastructure and application issues, but smaller platforms struggle to remain competitive without access to the same resources ...

February 27, 2024

Generative AI has recently experienced unprecedented dramatic growth, making it one of the most exciting transformations the tech industry has seen in some time. However, this growth also poses a challenge for tech leaders who will be expected to deliver on the promise of new technology. In 2024, delivering tangible outcomes that meet the potential of AI, and setting up incubator projects for the future will be key tasks ...

February 26, 2024

SAP is a tool for automating business processes. Managing SAP solutions, especially with the shift to the cloud-based S/4HANA platform, can be intricate. To explore the concerns of SAP users during operational transformations and automation, a survey was conducted in mid-2023 by Digitate and Americas' SAP Users' Group ...

February 22, 2024

Some companies are just starting to dip their toes into developing AI capabilities, while (few) others can claim they have built a truly AI-first product. Regardless of where a company is on the AI journey, leaders must understand what it means to build every aspect of their product with AI in mind ...

February 21, 2024

Generative AI will usher in advantages within various industries. However, the technology is still nascent, and according to the recent Dynatrace survey there are many challenges and risks that organizations need to overcome to use this technology effectively ...

February 20, 2024

In today's digital era, monitoring and observability are indispensable in software and application development. Their efficacy lies in empowering developers to swiftly identify and address issues, enhance performance, and deliver flawless user experiences. Achieving these objectives requires meticulous planning, strategic implementation, and consistent ongoing maintenance. In this blog, we're sharing our five best practices to fortify your approach to application performance monitoring (APM) and observability ...

February 16, 2024

In MEAN TIME TO INSIGHT Episode 3, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses network security with Chris Steffen, VP of Research Covering Information Security, Risk, and Compliance Management at EMA ...