In today's IT world, with enterprises becoming more dependent on application-based business services, high availability and performance of applications are becoming increasingly important. Enterprises have started virtualizing their IT infrastructure more than ever, and this has resulted in applications becoming virtualized as well. Virtualization is putting application performance to the fore. According to an industry survey by Gridstore, improving application performance (51%) was the top business priority for virtualized environments in mid-market organizations.
This was followed by the need to reduce I/O bottlenecks between virtual machines (VM) and storage (34%), the need for increased VM density (34%), the need to decrease storage costs (27%), and the need for improved manageability for virtualized systems (24%).
It's always hard for the IT teams to find out what the issue is with slow and unresponsive applications on the network. If the issue is with the application itself, then application performance monitoring (APM) tools will help monitor that for you, but when the problem actually spirals down to the virtual infrastructure hosting the app, then it becomes more complex to pinpoint the source of the issue. Needless to say, troubleshooting becomes even more difficult.
Synergy of both APM and virtualization monitoring technology becomes key to providing more visibility across the application stack – from the app to the VM to the underlying host resources including storage. This tight integration between both the technologies is what IT systems, virtual, storage and data center admins are looking for.
The following covers some important use cases of application performance in a virtualized environment, and what you need to monitor to avoid performance issues and application downtime.
Why is the Application Slow?
Well, it could be slow because of an application performance issue; it could be due to a slow VM; or, it could be that there are not enough storage resources for the VMs to use on the host.
Application Issue: APM tools will help you monitor the application metrics and diagnose the cause for application slowdown. Whether it is application downtime or latency, you will be able to drill down into the various performance components and identify the problem.
VM Issue: If all is well with the application, it could be the VM workload that is causing the application running on it to be slow. You need access to VM performance metrics to identify any resource contention on the host so that you can troubleshoot application and workload issues in context of discovered virtual dependencies. Typically, it could be the depletion of CPU, memory or disk associated with the host that can cause bottlenecks and result in degraded VM performance.
Datastore Issue: If the VM seems to be in good shape, it could be a datastore issue. You will want to take a closer look at the datastore IOPS, latency, and if any datastore is running low on disk space.
This monitoring is from the app to the VM and datastore. The other way round should also be possible. From the host, you need to be able to identify the VM, then look at the datastore data which might also show applications related to this datastore, along with their status.
What is Causing the VM to Slow Down?
VM Performance Issue: If a VM running your business apps is slow, you need visibility into VM performance metrics such as CPU, memory, IOPS, network throughput, etc. If there’s any resource contention causing workload to choke the performance of the VM, you should be able to get alerts pinpointing what issue happened and when. This will help you identify the cause of a VM slowdown.
Datastore Bottleneck: If the VM metrics all look good, you can explore the storage data for any bottleneck. Maybe there are more VMs than the datastore could support. You need to be able to find out which VMs are using the most storage or about to run out of storage. Even too many VMs on a datastore would result in contention of resource and cause the VMs to be sluggish.
Storage Capacity, Usage and Performance
A datastore performance analysis is a useful resource for drilling down into VM storage resources to see what the storage usage has been, how much more load can your VM environment sustain and where performance issues occur. By exploring the datastore data for your VMware datastores, and local storage and cluster shared volumes (CSV) for Hyper-V hosts, you can see storage capacity and performance across the entire virtual environment and diagnose where you are having issues. You can easily identify:
Usage: Based on busiest datastores – drill down to identify which VM or application is causing the load.
Performance: Based on busiest VMs – drill to the datastore and see what other VMs are affected.
Capacity: Which datastores and VMs are low on storage and which ones will run out first.
Storage doesn't stop with the VM datastores and extends to external storage arrays if you are maintaining a SAN environment for storage and backup. Performance issues in the storage LUNs can lead to VM slow down which will in turn affect application performance. You should be able to identify which VM is mapped to which storage arrays so that you can identify and stop VM issues by troubleshooting and fixing the storage environment.
Monitor your entire application stack from the app to the VM to the host to that datastore and external storage. Don’t let an unnoticed bottleneck throttle your application performance and availability.
Distributed tracing has been growing in popularity as a primary tool for investigating performance issues in microservices systems. Our recent DevOps Pulse survey shows a 38% increase year-over-year in organizations' tracing use. Furthermore, 64% of those respondents who are not yet using tracing indicated plans to adopt it in the next two years ...
Businesses are embracing artificial intelligence (AI) technologies to improve network performance and security, according to a new State of AIOps Study, conducted by ZK Research and Masergy ...
What may have appeared to be a stopgap solution in the spring of 2020 is now clearly our new workplace reality: It's impossible to walk back so many of the developments in workflow we've seen since then. The question is no longer when we'll all get back to the office, but how the companies that are lagging in their technological ability to facilitate remote work can catch up ...
The pandemic accelerated organizations' journey to the cloud to enable agile, on-demand, flexible access to resources, helping them align with a digital business's dynamic needs. We heard from many of our customers at the start of lockdown last year, saying they had to shift to a remote work environment, seemingly overnight, and this effort was heavily cloud-reliant. However, blindly forging ahead can backfire ...
SmartBear recently released the results of its 2021 State of Software Quality | Testing survey. I doubt you'll be surprised to hear that a "lack of time" was reported as the number one challenge to doing more testing, especially as release frequencies continue to increase. However, it was disheartening to see that a lack of time was also the number one response when we asked people to identify the biggest blocker to professional development ...