Infrastructures have come a long way in the last five years, but one device that is lagging behind is storage arrays.
Sure, arrays are faster and easier to configure, but with applications riding on top of virtualization using shared storage, arrays are often the hidden cause of application performance issues.
These issues are often difficult to pinpoint because the symptoms are often transient. The true problem is several levels away from the symptoms, and most monitoring tools can only look at parts of the problem, making diagnosis very difficult.
How do IT professionals tell if they have a storage performance issue?
Generally, users should watch several key performance indicators (KPI) on both their applications and systems:
Server and Application
At this level, users need to monitor latency and determine how long their application is waiting on storage to return data. For example, Microsoft recommends that storage for Exchange return data in 20 milliseconds or less, or it could negatively affect the application.
For VMware and other hypervisors, the big issue here is that storage can be a shared resource. Contention can arise as VMs fight for storage I/O, therefore users need to pay attention to latency and total I/O for VM and datastores but with consideration of the CPU and network load as well – high latency with low I/O could be a host issue. If contention is suspected at the hypervisor level, it typically can be solved by moving VMs or moving to faster storage.
Note that VMware vSphere 5 includes Storage vMotion to help smooth some of these issues, but there is only so much it can do before the user will need to step in.
Arrays vary in their architectures and capabilities, but in general, users need to monitor LUNs and RAID Groups to look for contention in the array and controllers and ports for overloading.
If a user is experiencing high application latency, but doesn't see any problem at the server or hypervisor level, there may be contention in the array as LUNs vie for storage I/O. If more than one LUN shares a set of disks, then a completely unrelated application could be affecting performance, an issue that would be clearly visible at the array level.
Overloaded controllers or ports will generally slow down all the applications on those LUNs, making part of the infrastructure seem sluggish. The remedy is generally reconfiguring the loads to different disks, ports or controllers.
Storage performance is one of the big challenges for application administrators today, especially since diagnosis is not always simple. It's important to have tools that can dive into different domains (server, app, virtualization, storage) and dive deep to get to the heart of the issue.
As a final note, planning goes a long way in avoiding storage issues (as with anything else). It is critical to measure or estimate average and peak I/O loads of applications and then place them on the appropriate "tier" of storage. Spending time up front to account for the expected loads (and growth) will help everyone sleep better at night.
About Jonathan Reeve
Jonathan Reeve, Senior Director of Product Management at SolarWinds, has built a career integrating hands-on technical development with senior-level strategic management. Having previously served as the VP of Product Strategy for Hyper9, Reeve was responsible for the company's flagship product, Virtual Environment Optimization suite. His experience spans computer networking, systems management and virtualization technologies, helping numerous start-ups and established companies generate market traction. Prior to joining Hyper9, Reeve drove product management for the network management product line at Smarts, which was acquired by EMC in 2005. He has a degree in Electrical Engineering and a PhD in Computer Networking from the University of Durham (UK).
More than 80% of organizations have experienced a significant increase in pressure on digital services since the start of the COVID-19 pandemic, according to a new study conducted by PagerDuty ...
In Episode 9, Sean McDermott, President, CEO and Founder of Windward Consulting Group, joins the AI+ITOPS Podcast to discuss how the pandemic has impacted IT and is driving the need for AIOps ...
Michael Olson on the AI+ITOPS Podcast: "I really see AIOps as being a core requirement for observability because it ... applies intelligence to your telemetry data and your incident data ... to potentially predict problems before they happen."
Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams. It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability ...
The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...
All application traffic travels across the network. While application performance management tools can offer insight into how critical applications are functioning, they do not provide visibility into the broader network environment. In order to optimize application performance, you need a few key capabilities. Let's explore three steps that can help NetOps teams better support the critical applications upon which your business depends ...
In Episode 8, Michael Olson, Director of Product Marketing at New Relic, joins the AI+ITOPS Podcast to discuss how AIOps provides real benefits to IT teams ...
Will Cappelli on the AI+ITOPS Podcast: "I'll predict that in 5 years time, APM as we know it will have been completely mutated into an observability plus dynamic analytics capability."
When you consider that the average end-user interacts with at least 8 applications, then think about how important those applications are in the overall success of the business and how often the interface between the application and the hardware needs to be updated, it's a potential minefield for business operations. Any single update could explode in your face at any time ...