The Impact of Storage on Application Performance
December 14, 2011
Jonathan Reeve
Share this

Infrastructures have come a long way in the last five years, but one device that is lagging behind is storage arrays.

Sure, arrays are faster and easier to configure, but with applications riding on top of virtualization using shared storage, arrays are often the hidden cause of application performance issues.

These issues are often difficult to pinpoint because the symptoms are often transient. The true problem is several levels away from the symptoms, and most monitoring tools can only look at parts of the problem, making diagnosis very difficult. 

How do IT professionals tell if they have a storage performance issue?

Generally, users should watch several key performance indicators (KPI) on both their applications and systems:

Server and Application

At this level, users need to monitor latency and determine how long their application is waiting on storage to return data. For example, Microsoft recommends that storage for Exchange return data in 20 milliseconds or less, or it could negatively affect the application. 

Virtualization

For VMware and other hypervisors, the big issue here is that storage can be a shared resource. Contention can arise as VMs fight for storage I/O, therefore users need to pay attention to latency and total I/O for VM and datastores but with consideration of the CPU and network load as well – high latency with low I/O could be a host issue. If contention is suspected at the hypervisor level, it typically can be solved by moving VMs or moving to faster storage.

Note that VMware vSphere 5 includes Storage vMotion to help smooth some of these issues, but there is only so much it can do before the user will need to step in.

Storage

Arrays vary in their architectures and capabilities, but in general, users need to monitor LUNs and RAID Groups to look for contention in the array and controllers and ports for overloading.

If a user is experiencing high application latency, but doesn't see any problem at the server or hypervisor level, there may be contention in the array as LUNs vie for storage I/O. If more than one LUN shares a set of disks, then a completely unrelated application could be affecting performance, an issue that would be clearly visible at the array level.

Overloaded controllers or ports will generally slow down all the applications on those LUNs, making part of the infrastructure seem sluggish. The remedy is generally reconfiguring the loads to different disks, ports or controllers. 

Storage performance is one of the big challenges for application administrators today, especially since diagnosis is not always simple. It's important to have tools that can dive into different domains (server, app, virtualization, storage) and dive deep to get to the heart of the issue.

As a final note, planning goes a long way in avoiding storage issues (as with anything else). It is critical to measure or estimate average and peak I/O loads of applications and then place them on the appropriate "tier" of storage. Spending time up front to account for the expected loads (and growth) will help everyone sleep better at night.  

About Jonathan Reeve

Jonathan Reeve, Senior Director of Product Management at SolarWinds, has built a career integrating hands-on technical development with senior-level strategic management. Having previously served as the VP of Product Strategy for Hyper9, Reeve was responsible for the company's flagship product, Virtual Environment Optimization suite. His experience spans computer networking, systems management and virtualization technologies, helping numerous start-ups and established companies generate market traction. Prior to joining Hyper9, Reeve drove product management for the network management product line at Smarts, which was acquired by EMC in 2005. He has a degree in Electrical Engineering and a PhD in Computer Networking from the University of Durham (UK).

Share this

The Latest

July 25, 2024

The 2024 State of the Data Center Report from CoreSite shows that although C-suite confidence in the economy remains high, a VUCA (volatile, uncertain, complex, ambiguous) environment has many business leaders proceeding with caution when it comes to their IT and data ecosystems, with an emphasis on cost control and predictability, flexibility and risk management ...

July 24, 2024

In June, New Relic published the State of Observability for Energy and Utilities Report to share insights, analysis, and data on the impact of full-stack observability software in energy and utilities organizations' service capabilities. Here are eight key takeaways from the report ...

July 23, 2024

The rapid rise of generative AI (GenAI) has caught everyone's attention, leaving many to wonder if the technology's impact will live up to the immense hype. A recent survey by Alteryx provides valuable insights into the current state of GenAI adoption, revealing a shift from inflated expectations to tangible value realization across enterprises ... Here are five key takeaways that underscore GenAI's progression from hype to real-world impact ...

July 22, 2024
A defective software update caused what some experts are calling the largest IT outage in history on Friday, July 19. The impact reverberated through multiple industries around the world ...
July 18, 2024

As software development grows more intricate, the challenge for observability engineers tasked with ensuring optimal system performance becomes more daunting. Current methodologies are struggling to keep pace, with the annual Observability Pulse surveys indicating a rise in Mean Time to Remediation (MTTR). According to this survey, only a small fraction of organizations, around 10%, achieve full observability today. Generative AI, however, promises to significantly move the needle ...

July 17, 2024

While nearly all data leaders surveyed are building generative AI applications, most don't believe their data estate is actually prepared to support them, according to the State of Reliable AI report from Monte Carlo Data ...

July 16, 2024

Enterprises are putting a lot of effort into improving the digital employee experience (DEX), which has become essential to both improving organizational performance and attracting and retaining talented workers. But to date, most efforts to deliver outstanding DEX have focused on people working with laptops, PCs, or thin clients. Employees on the frontlines, using mobile devices to handle logistics ... have been largely overlooked ...

July 15, 2024

The average customer-facing incident takes nearly three hours to resolve (175 minutes) while the estimated cost of downtime is $4,537 per minute, meaning each incident can cost nearly $794,000, according to new research from PagerDuty ...

July 12, 2024

In MEAN TIME TO INSIGHT Episode 8, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at EMA discusses AutoCon with the conference founders Scott Robohn and Chris Grundemann ...

July 11, 2024

Numerous vendors and service providers have recently embraced the NaaS concept, yet there is still no industry consensus on its definition or the types of networks it involves. Furthermore, providers have varied in how they define the NaaS service delivery model. I conducted research for a new report, Network as a Service: Understanding the Cloud Consumption Model in Networking, to refine the concept of NaaS and reduce buyer confusion over what it is and how it can offer value ...