
This is Part 2 of a blog series on how to find root cause of the most common application experience issues.
Start with: Assuring Exceptional Experiences with Applications Requires Assuring Network Performance - Part 1
Responsiveness Issues
This type of issue is often reported as "the application is too slow." A likely root cause of unacceptable responsiveness resulting from network issues is an overloaded network (e.g., the capacity of the network is insufficient to handle the current traffic). If a network is overloaded, it is possible that its DNS server is also overloaded and either responds very slowly or not at all. Observing traffic bursts, especially microbursts, with detailed metrics is another indicator of an overloaded network and a cause of irregular latencies. If any of these are the root cause, then traffic must be shaped accordingly and/or capacity must be added.
When resolving these issues, IT teams analyze network, application and protocol latency using observed metrics such as DNS and HTTP latency, one-way latency, round-trip time, and Zero-Window activity. Additional observed behaviors and metrics will reveal which specific problem is the culprit. These metrics include throughput measured as gigabits per second (Gbps), the number of connections per second, and the number of concurrent connections. Network packet and flow data provides the insights and context to identify the root cause. Packet data captured with high fidelity using high-performance monitoring will detect and characterize traffic bursts and the number of connections per second. Flow data reveals top talkers and the number of packets transmitted per second.
Streaming Issues
Communications and streaming applications that use Voice over IP (VoIP), videoconferencing, and other streaming services are increasingly in use for entertainment, education and collaboration, especially in the COVID-19 era. Experiences with these applications are directly impacted by network performance.
Choppy and freezing video, unsynchronized audio and video, audio dropout, and other noticeable types of distortion are the typical issues that result in unsatisfactory experiences. These annoying issues are the result of streaming errors and packet loss that are readily noticed, complained about, and reported to IT and customer support help desks.
To diagnose the root causes and assure exceptional streaming experiences, IT needs to monitor and observe jitter, sequence errors, retransmissions, and Maximum Transmission Unit (MTU) fragmentation. Excessive jitter and sequence errors result from various streaming errors, while retransmissions and fragmentation indicate the packet loss as the culprit. It is necessary to dig further to determine whether these problems are caused by routing problems or MTU fragmentation. High MTU values mean that larger packets are transmitted that take relatively longer to process and retransmit and hence inhibit a smooth flow of digitized voice and video streams.
Other Performance Issues
The applications that rely on streaming services such as high frequency trading and high-performance computing, are increasingly relying on higher throughput that is driving the use of 100Gbps connectivity. Timing tolerances, latencies and all other performance metrics become finer as data rates increase. This necessitates higher fidelity monitoring to provide the necessary visibility and observability to ensure the best possible SLEs and MTTR. As an example, detecting gaps in high frequency trading streams requires observing microbursts and latencies with sub-millisecond resolution. Therefore, it is essential to have a clearly defined SLE, especially for high-performance applications and underlying infrastructure, then match to it the metrics to observe and the tools and resolution needed to do so.
Experiences impact organizations in many ways, which is why delivering exceptional experiences is a critical success factor. Experiences with applications depend on network performance. As a result, effectively and efficiently assuring experiences requires visibility and observability into both network and application behaviors and metrics. Network Performance Management and Diagnostics driven by monitoring is therefore a necessary complement to Application Performance Management in all environments.