Infrastructures have come a long way in the last five years, but one device that is lagging behind is storage arrays.
Sure, arrays are faster and easier to configure, but with applications riding on top of virtualization using shared storage, arrays are often the hidden cause of application performance issues.
These issues are often difficult to pinpoint because the symptoms are often transient. The true problem is several levels away from the symptoms, and most monitoring tools can only look at parts of the problem, making diagnosis very difficult.
How do IT professionals tell if they have a storage performance issue?
Generally, users should watch several key performance indicators (KPI) on both their applications and systems:
Server and Application
At this level, users need to monitor latency and determine how long their application is waiting on storage to return data. For example, Microsoft recommends that storage for Exchange return data in 20 milliseconds or less, or it could negatively affect the application.
Virtualization
For VMware and other hypervisors, the big issue here is that storage can be a shared resource. Contention can arise as VMs fight for storage I/O, therefore users need to pay attention to latency and total I/O for VM and datastores but with consideration of the CPU and network load as well – high latency with low I/O could be a host issue. If contention is suspected at the hypervisor level, it typically can be solved by moving VMs or moving to faster storage.
Note that VMware vSphere 5 includes Storage vMotion to help smooth some of these issues, but there is only so much it can do before the user will need to step in.
Storage
Arrays vary in their architectures and capabilities, but in general, users need to monitor LUNs and RAID Groups to look for contention in the array and controllers and ports for overloading.
If a user is experiencing high application latency, but doesn't see any problem at the server or hypervisor level, there may be contention in the array as LUNs vie for storage I/O. If more than one LUN shares a set of disks, then a completely unrelated application could be affecting performance, an issue that would be clearly visible at the array level.
Overloaded controllers or ports will generally slow down all the applications on those LUNs, making part of the infrastructure seem sluggish. The remedy is generally reconfiguring the loads to different disks, ports or controllers.
Storage performance is one of the big challenges for application administrators today, especially since diagnosis is not always simple. It's important to have tools that can dive into different domains (server, app, virtualization, storage) and dive deep to get to the heart of the issue.
As a final note, planning goes a long way in avoiding storage issues (as with anything else). It is critical to measure or estimate average and peak I/O loads of applications and then place them on the appropriate "tier" of storage. Spending time up front to account for the expected loads (and growth) will help everyone sleep better at night.
About Jonathan Reeve
Jonathan Reeve, Senior Director of Product Management at SolarWinds, has built a career integrating hands-on technical development with senior-level strategic management. Having previously served as the VP of Product Strategy for Hyper9, Reeve was responsible for the company's flagship product, Virtual Environment Optimization suite. His experience spans computer networking, systems management and virtualization technologies, helping numerous start-ups and established companies generate market traction. Prior to joining Hyper9, Reeve drove product management for the network management product line at Smarts, which was acquired by EMC in 2005. He has a degree in Electrical Engineering and a PhD in Computer Networking from the University of Durham (UK).
The Latest
Most organizations suffer from some form of alert noise. Alert noise is only going to increase as organizations support cloud-native applications spanning multiple public and private clouds, including ephemeral deployments and more. It's not going to get easier for organizations to understand the signal from all those alerts being sent. So what can be done about it? ...
This blog presents the case for a radical new approach to basic information technology (IT) education. This conclusion is based on a study of courses and other forms of IT education which purport to cover IT "fundamentals" ...
To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...
APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...
CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...
Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...
Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...
Managing availability and performance within SAP environments has long been a challenge for IT teams. But as IT environments grow more complex and dynamic, and the speed of innovation in almost every industry continues to accelerate, this situation is becoming a whole lot worse ...
Harnessing the power of network-derived intelligence and insights is critical in detecting today's increasingly sophisticated security threats across hybrid and multi-cloud infrastructure, according to a new research study from IDC ...
Recent research suggests that many organizations are paying for more software than they need. If organizations are looking to reduce IT spend, leaders should take a closer look at the tools being offered to employees, as not all software is essential ...