The data deluge has brought renewed focus on an old problem: the enormous performance gap that exists in input/output (I/O) between a server’s memory and its storage. I/O typically takes a mere 100 nanoseconds for information stored in the server’s memory, while I/O to a hard disk drive (HDD) takes about 10 milliseconds — a difference of five orders of magnitude that is having a profound adverse impact on application performance.
This bottleneck exists for both dedicated and virtualized servers, but can be far worse with the latter because virtualization creates the potential for much greater resource contention. Virtualization affords numerous benefits by dramatically improving server utilization (from around 10 percent in dedicated servers to 50 percent or more), but the increased per-server application load inevitably exacerbates the I/O bottleneck. Multiple applications, all competing for the same finite I/O, have the ability to turn what might have been orderly, sequential access for each into completely random read/writes for the server, creating a worst case scenario for HDD performance.
In a virtualized server, the primary symptom of contention is when any virtual machine (VM) must wait for CPU cycles, and/or for I/O to memory or disk. Fortunately, such contention can be minimized by judicious balancing of the total workload among all virtual servers, and by optimizing the allocation of each server’s physical resources. Taking these steps can enable a VM to perform as well as a dedicated server.
Unfortunately, however, server virtualization is normally accompanied by storage virtualization, which virtually assures an adverse impact on application performance. Compared to direct attached storage (DAS), a storage area network (SAN) or network-attached storage (NAS) has a higher I/O latency, combined with a lower bandwidth or throughput that also decreases I/O Operations per second (IOPs). Frequent congestion on the intervening Fibre Channel (FC), FC over Ethernet, iSCSI or Ethernet network further degrades performance.
The extent of the I/O bottleneck issue became apparent in a recent LSI survey of 412 European datacenter managers. The results revealed that while 93 percent acknowledge the critical importance of optimizing application performance, a full 75 percent do not feel they are achieving the desired results. Not surprisingly, 70 percent of the survey respondents cited storage I/O as the single biggest bottleneck in the datacenter today.
Cache in Flash
Caching data to memory in a server, or in a SAN controller or cache appliance, is a proven technique for reducing I/O latency and, thereby, improving application-level performance. But because the size of the cache that is economically feasible with random access memory (measured in gigabytes) is only a small fraction of the capacity of even a single disk drive (measured in terabytes), traditional RAM-based caching is increasingly unable to deliver the performance gains required in today’s virtualized datacenters.
Consider what happens in a typical virtualized server. Each VM is allocated some amount of RAM, and together these allocations usually exceed the total amount of physical memory available. This can result in the VMs competing for memory, and as they do, it is necessary for the hypervisor to swap pages out and in, to and from (very slow) disk storage, further exacerbating the I/O bottleneck.
Flash memory technology helps break through the cache size limitation imposed by RAM to again make caching an effective and cost-effective means for accelerating application performance. As shown in the diagram, flash memory, with an I/O latency of less than 50 microseconds, fills the significant performance gap between main memory and Tier 1 storage.
The closer the data is to the processor, the better the performance. This is why applications requiring high performance normally use DAS, and it is also why flash cache provides its biggest benefit when placed directly in the virtualized server on the PCI Express (PCIe) bus. Intelligent caching software is then used to automatically and transparently place “hot data” (the most frequently accessed data) in the low-latency flash memory, where it is accessed up to 200 times faster than when on a Tier 1 HDD. The flash cache can also be configured to become the “swap cache” for main memory, thus helping to mitigate performance problems being caused by memory contention.
The intelligent caching software detects hot data by constantly monitoring the physical server’s I/O activity to find the specific ranges of logical block addresses that are experiencing the most reads and/or writes, and continuously moving these into the cache. With this approach, the flash cache is able to support all of the VMs running in any server.
The intelligent caching algorithms normally give the highest priority to highly random, small block-oriented applications, such as those for databases and on-line transaction processing, because these stand to benefit the most. By contrast, applications with sequential read and/or write operations benefit very little from caching (except when multiple such applications are configured to run on the same server!), so these are given the lowest priority.
How can flash memory, with a latency of up to 100 times higher than RAM, outperform traditional caching systems? The reason is the sheer capacity possible with flash memory, which dramatically increases the “hit rate” of the cache. Indeed, with some flash cache cards now supporting multiple terabytes of high-performance solid state storage, there is often sufficient capacity to store rather large datasets for all of a server’s VMs as hot data.
Exhaustive internal LSI testing has shown that the application-level performance gains afforded by flash cache acceleration in both dedicated and virtualized servers are considerable. For servers with DAS, which already enjoy the “proximity performance advantage” over SAN/NAS environments, typical improvements can be in the range of 5 to10 times. In environments with a SAN or NAS, which experience additional latency caused by the network, server-side flash caching can improve performance even more — by up to 30 times in some cases.
Flash Forward to the Future
Flash memory has a very promising future. Flash is already the preferred storage medium in tablets and ultrabooks, and increasingly in laptop computers. Solid state drives (SSDs) are replacing or supplementing HDDs in desktop PCs and servers with DAS, while the fastest SSD storage tiers are growing larger in SAN and NAS configurations.
Solid state storage is also non-volatile, so unlike caching with RAM, which is read-only and subject to data loss during a power failure, a flash cache can support both reads and writes, and some solutions now offer RAID-like data protection, making the cache the equivalent of a fast storage tier. During internal LSI testing, the addition of acceleration for writes to flash cache (which are then persisted to primary storage) can improve application-level performance even more than the 10 or 30 times noted above in write-intensive applications.
The key to making continued improvements in flash price/performance, similar to what has been the case for CPUs with Moore’s Law, is the flash storage processors (FSPs) that facilitate shrinking flash memory geometries and/or higher cell densities. To accommodate these advances, future generations of FSPs will need to offer ever more sophisticated error correction (to improve reliability) and wear-leveling (to improve endurance).
Flash memory enjoys some other advantages that are beneficial in virtualized datacenters, as well, including a combination of higher density and lower power consumption compared to HDDs, which enables more storage in a smaller space that also requires less cooling. SSDs are also typically far more reliable than HDDs, and should one ever fail, RAID data protection is restored much faster.
As the price/performance of flash memory continues to improve, the rapid adoption of solid state storage will likely continue in the datacenter. But don’t expect SSDs to completely replace HDDs any time soon. HDDs have tremendous advantages in storage capacity and the per-gigabyte cost of that capacity. And because the vast majority of data in most organizations is accessed only occasionally, the higher I/O latency of HDDs is normally of little consequence—particularly because this “dusty data” can quickly become hot data in a flash (pun intended!) on those infrequent occasions when needed.
Flash cache has now become part of the virtualization paradigm based on its ability to maximize the benefits. Servers are virtualized to get more work from each one, resulting in considerable savings in capital and operational expenditures, as well as in precious space and power. Storage is virtualized to achieve similar savings through greater efficiencies and economies of scale. Flash cache helps provide a more cost-effective way to get even more work from virtualized servers and faster work from virtualized storage.
ABOUT Tony Afshary
Tony Afshary is the business line director for Nytro Solutions Products within the Accelerated Solutions Division of LSI Corporation. In this role, he is responsible for product management and product marketing of the LSI Nytro product family of enterprise flash-based storage, including PCIe based flash, utilizing seamless and intelligent placement of data to accelerate datacenter applications.
Previously, Afshary was responsible for marketing and planning of LSI’s data protection and management and storage virtualization products. Prior to that, he was the director of Customer/Application Engineering for LSI’s server/storage products. He has been in the storage industry for over 13 years. Before joining LSI, Afshary worked at Intel for 11 years, managing marketing and development activities for storage and communication processors. Afshary received a bachelor’s degree in Electrical and Computer Engineering and an MBA from Arizona State University.