
Many times customers want to know why their measured performance doesn't match the speed advertised (by the platform vendor, software vendor, network vendor, etc). Assuming the advertised speeds are (a) within the realm of physical possibility and obeys the laws of physics, and (b) are real achievable speeds and not "click-bait," there are at least ten reasons for being unable to achieve advertised speeds. In situations where customer expectations and measured performance don't align, use the following checklist to help determine the reason(s) why.
1. Processing power of your computer
No matter the task, or number of tasks, CPU power, CPU cache, and threading capability will be essential factors in achieving performance benchmark results. Lower CPU power, or CPUs with lower clock speeds determine how quickly the system can complete its tasks, including launching the test harness, writing to the network, writing to disk, and a host of other tasks.
2. Latency
Latency is defined as "the delay before a transfer of data begins following an instruction for its transfer." In terms of performance, multiple forms of latency can impact the results. Network latency, which is the amount of time it takes for the data to move from one place to another, can degrade performance in a replication configuration. In addition to network latency, systems can experience data transfer latency between the attached disks, storage devices, platforms and within the software solution. Data transfer latency can also impede performance.
3. Limitations of your network
While latency is one of the most common issues with networks, other issues can exist within the network that cause differences between the measured and advertised performance. These differences include topology, deployed switches, routers, firewalls and other devices within the architecture. For example, a firewall that is analyzing packets and traffic can create delays in performance.
4. Additional devices on the network
Additionally, if you are not using a fully isolated environment, that is, an environment where servers, switches, storage cabinets, etc. are not affected by network traffic associated with other devices, then those additional devices on the network which are also consuming available bandwidth will cause performance degradation.
5, Additional devices on the hypervisor
Similar to point four above, the presence of additional virtual machines on a hypervisor host (VM, Hyper-V, KVM, etc) can impact the measured performance of other VMs running on it. Hosting multiple virtual machines on a single hypervisor host, while practical for many applications and situations, may also introduce a phenomenon known as "noisy neighbor" which can cause additional performance issues and performance loss. This loss typically shows up in tests specifically aimed at proving performance numbers.
6. Outdated drivers
When outdated, several different types of drivers can cause performance loss or issues, especially network and storage drivers. Outdated drivers may contain bugs that have been fixed in future versions, or lack optimizations and enhancements that drive performance to higher numbers. In addition, an outdated driver may not operate correctly with other parts of the stack. It is best to always run with the latest driver version for your configuration, architecture, and test case.
7. Memory Speed and Capacity
Memory speed determines the ability of the computer to perform at scale. Lower total memory capacity and lower memory speeds can cause sluggish performance, especially if the test harness for measuring performance requires multi-threading. In addition, low system memory can result in excessive page swapping and disk thrashing. In addition, faster memory enhances the computer's ability to transfer large amounts of data between the parts of the system, including disks, networks, and other applications.
8. Outdated OS and/or application software
Similar to outdated drivers, attempting to measure performance of an application, architecture, or HA solution while running outdated software can drastically impact your measurements.Outdated software can contain bugs that impact performance and have been fixed or remediated in newer versions. In addition, newer versions of software most likely contain enhancements that harness the improvements of modern infrastructure, faster CPUs and more memory. If you aren't getting close to advertised speeds, be sure to update the software involved in the testing.
9. Infrastructure health
The health of the infrastructure is another important factor in achieving published performance numbers. Regardless of whether the systems are hosted on-prem or in the cloud, if the components within the infrastructure are unhealthy, the published numbers will be harder to achieve. For example, any component within the network, compute, or storage layer of the infrastructure that is performing sub-optimally will jeopardize the performance.
10. Test harness
Do not forget that the test harness, the tools used to measure the expected performance, can also play a role in reaching or not reaching the expected results. As a simple example, using different versions of a test tool, or different parameters and options can lead to different results. In a more complicated scenario, using a database benchmark tool to measure replication and HA performance will have a different outcome than using a tool that focuses on measuring the speed independent of the applications involved. In other words, measuring speed with or without other layers of processing between the tool and the underlying system components (software, hardware, etc.) can change the performance numbers.
I could add a number of other items to the list regarding performance, including system usage, environmental factors, disk IOPS, and type of operations (sync or async replication, for example). While this list is not exhaustive, it does provide customers a small window of insight into what may be causing the difference between measured and advertised performance. Be sure to use this list, and your own additional suggestions to properly identify the bottlenecks and critical issues. Then focus on eliminating inefficiencies in any of these items, and remediating things to increase the overall performance of the system.