Manage the Performance of Virtual Environments Using Dynamic Alerts
June 16, 2014

Karthik Ramachandran
SolarWinds

Share this

As we know, virtual environments consist of many moving pieces and are generally complex to setup. Typically, IT environments, depending on the size of the organization, can have several hundred VMs down to a handful of VMs. For such virtual infrastructure deployments, it helps to monitor the performance of VM and VM usage. It's also equally important to ensure the health of your virtual appliances are always in check and to immediately know when something goes wrong.

What you really don't want is to have alerts paging you 24/7, especially when they're not critical situations. Alert management can be a subtle, but dangerous activity. Additionally, manually setting alert thresholds can be an extremely time consuming task. Alternatively, using static thresholds that don't reflect real performance problems often result in alert storms, where administrators stop watching alerts carefully. This is where the "dangerous" part comes in and often true critical alerts can be lost in the noise and missed. As a result, intelligent, dynamic alerting can be critical for both staff efficiency and system reliability.

False Alerts: Reasons Why You Get Them and How to Avoid Them

Here are a few examples why your virtual environment may trigger alerts more frequently than normal:

■ Events that frequently occur, such as resource consumption can trigger alerts more often than most other virtual components.

■ You can get "spam" alerts from VMs or hosts that are no longer in use or that have been discharged.

■ Not properly tuning threshold levels can lead to a sudden spike in alerts.

Having intelligent alerting processes help ensure irrelevant alerts are not generated. This gives virtual admins time to look at "real" alerts and fix them. Here's what you can do to avoid alerting errors:

■ Set up alerts for specific VMs that you think are really going to impact your users or your business.

■ Leverage dynamic thresholds based on historical baseline trends whenever possible to set more realistic thresholds for your clusters, hosts, VMs, and datastore.

■ Establish well-defined threshold settings—this way you can optimize the kind of alerts you receive during the day and ensure that you're not bothered after work hours.

■ Set the right dependencies to significantly lower the amount of alerts you receive.

■ Forward specific alerts to the defined teams, since they understand the severity of the alert and can fix it right away.

Determine What to Monitor and Why

Most admins have to monitor hundreds of virtual appliances, which means you're probably dealing with plenty of alerts. Under these circumstances you'll have to determine a few things:

■ Go over each host to see if all VMs under the host must be monitored or if only a few critical VMs need to be monitored for alerts.

■ Talk to your business groups or users and understand what the impact will be. This will give you a sense of how many VMs and datastores have to be setup for alerts. They may have mission critical applications running inside them, which may affect business performance.

Statistical Thresholds: A Better Way to Set Baseline Values for your Virtual Environment

Normally, you would have to monitor the performance of hosts, VMs, and datastores for several weeks in order to know what the ideal or optimum baseline is to set warning and critical thresholds. However, integrated virtualization management tools can automatically calculate performance of clusters, hosts, VMs, and datastores and determine the baseline values.

IStatistical thresholds allow you to look at the following processes:

■ Applying thresholds to clusters, hosts, VMs, and datastores.

■ Understanding baseline statistics using standard deviation calculation for day and night system performance.

■ Gaining statistical insights into performance metrics and how they vary over time. Look at how stats are collected for higher and lower threshold values for individual VMs and hosts.

■ Calculating thresholds from historical performance data saves time in adjusting thresholds and provides more intelligent alerts.

■ Setting the right threshold values using the built-in baseline calculator. This calculates and applies the recommended threshold values for warning and critical stages for clusters, hosts, VMs, and datastores.

While this won't completely eliminate "spam" alerts, it will quickly let you get to a much smaller set for the administrator to deal with. In turn, it will let them spend more time and attention on striking that balance between monitoring your VM usage and hypervisor performance, and setting the right threshold values.

Karthik Ramachandran is Product Marketing Specialist at SolarWinds.

Share this

The Latest

October 20, 2017

You've heard of DevOps and SecOps, but NetOps? NetOps is a natural progression of legacy Network Operations to foster more efficient and resilient infrastructures through automation and intelligence. The efficacy of NetOps personnel is reliant upon understanding five key elements of a NetOps Platform and how to best utilize and implement each ...

October 19, 2017

It's also important to keep the diversity of the Advanced IT Analytics (AIA) landscape in mind as you plan for your investments. AIA is still not a market in the traditional sense. My vision of AIA is rather an arena of fast-growing exploration and invention, in which in-house development is beginning to cede to third-party solutions that can accelerate time to value ...

October 18, 2017

Most application performance monitoring (APM) tools offer user experience monitoring and transaction tracing capabilities. But, when there is infrastructure slowness affecting the application, these APM tools cannot always pinpoint the root cause of problems. This is where unified infrastructure monitoring comes in ...

October 17, 2017

Business transaction monitoring is the approach commonly used to identify and diagnose server-side processing slowness for web applications. While it is an important component of an application performance monitoring strategy, a key question is whether business transaction tracing is sufficient for ensuring peak application performance ...

October 16, 2017
Hurricane season is in full swing. With the latest incoming cases of mega-storms devastating the Southeastern shoreline, communities are struggling to restore daily normalcy. People have been stepping up and showing remarkable strength and leadership in helping those affected. However, there is another area that we need to remember in these trying times – and that is businesses continuity ...
October 12, 2017

Gartner highlighted the top strategic technology trends that will impact most organizations in 2018. The next trends focus on blending the digital and physical worlds to create an immersive, digitally enhanced environment. The last three refer to exploiting connections between an expanding set of people and businesses, as well as devices, content and services to deliver digital business outcomes ...

October 11, 2017

Gartner highlighted the top strategic technology trends that will impact most organizations in 2018. The first three strategic technology trends explore how artificial intelligence (AI) and machine learning are seeping into virtually everything and represent a major battleground for technology providers over the next five years ...

October 10, 2017
This is the sixth in my series of blogs inspired by EMA's AIA buyer's guide — directed at helping IT invest in Advanced IT Analytics (AIA), what the industry more commonly calls "Operational Analytics." In this blog, I examine scenario-related shopping cart objectives for AIA. At EMA, we evaluated seven unique scenarios relevant to AIA adoptions. Our scenarios included agile/DevOps, Integrated security, change impact awareness, capacity optimization, business impact, business alignment and unifying IT ...
October 06, 2017

In the Riverbed Future of Networking Global Survey, more than half of the respondents acknowledged that achieving operational agility is critical to the success of a modern enterprise, and next-generation networks as well as the technology to support them are key to reaching this goal ...

October 05, 2017

Legacy infrastructures are holding back their cloud and digital strategies, according to the Riverbed Future of Networking Global Survey 2017. Nearly all survey respondents agree that legacy network infrastructure will have difficulty keeping pace with the changing demands of the cloud and hybrid networks ...