Skip to main content

Solving 3 Painful Network Performance Issues with NPMD Solutions

Jay Botelho

Network performance issues come in all shapes and sizes, and can require vast amounts of time and resources to solve. As a matter of fact, in my last column, I explored recent survey data that shows 42 percent of IT teams feel they spend too much time troubleshooting the network. In addition, 38 percent feel they can't proactively identify performance issues and 35 percent have poor visibility across the entire network. Regardless of these challenges, network operations (NetOps) teams still need to push forward and do everything in their power to correct problems before they impact end user experiences and the proverbial bottom line.

Here are three examples of painful network performance issues you're likely to encounter this year, and how Network Performance Monitoring and Diagnostic (NPMD) solutions can help you overcome them:

1. Horrible VoIP / Unified Communications Interruptions

Picture this: A multinational pharmaceutical company with widely-distributed development, operations, and manufacturing recently installed an extensive (and expensive) telepresence solution. It enables global collaboration and helps the company bring products to market more quickly by leveraging the most talented employees, regardless of their location. But unfortunately, the quality is poor, resulting in team members constantly saying, "Why is the meeting quality we just experienced so bad? Didn't we just spend millions on this system? It's so frustrating."

In most cases, poor performance can be traced to Quality of Service (QoS) mis-configurations. And this becomes ever more likely in a highly-distributed network where traffic flows through multiple network devices, all which must be properly configured. With today's modern NPMD solutions you can reduce configuration errors with easy-to-apply, rules-based QoS policies and templates. The ability to save, backup, and deploy automatically-scheduled configuration changes means policies are consistent and accurate across the entire network. As policies are implemented, real-time performance reports can quickly identify errors for immediate remediation. Many traditional NPMD solutions lack the end-to-end visibility of the new next-gen platforms, which allow NetOps to resolve QoS issues impacting UC performance across complex networks … and eliminate employee complaints about QoS for good.

2. The Dreaded "Poor Performance" Report

Imagine that a handful of Tier 1 support engineers at a global network equipment manufacturer with distributed "follow-the-sun" technical support centers are reporting problems when using the online support software. The engineers only experience this problem occasionally, oftentimes making it all the way through the call, but sometimes experiencing long delays (10-20 seconds) per entry into the system. This is creating poor customer experiences and generating a needless increase in support escalations. The problem is not specific to a location, and a number of users have experienced the occasional slow-down.

Intermittent problems like this can be some of the most challenging and time-consuming for IT to track down. But not if your teams are using NPMD solutions. With enterprise-wide topology maps and the ability to set alerts for (in this case) application and network latency on the application in question, network engineers can quickly see who is experiencing the problem, when they are experiencing it, and the general conditions during which the problem arises. By comparing application and network latency measurements, NetOps can see the network is responding quickly, but at times, the application is not.

Assuming the network has been configured with some sort of packet capture appliance in at least one location where problems are being experienced, the network engineer can then drill into the network packets themselves, all the way into the payload, to see the specific application calls are made when the delays happen, and any errors reported as a result. With this level of detailed information in hand, NetOps is armed with the evidence they need to approach the application team and quickly address the problem.

3. Wait - What Just Happened? We Need Instant Replay!

In just about every case, you hear about a network issue after it's happened. That usually leaves you with two less-than-ideal choices. The first is to wait for it to happen again. Depending on the severity of the problem, that may not even be an option and even if it is, it just about always comes with some level of business impact. The second is to work to actively reproduce the problem. This is often very time consuming, and sometimes requires the time and cooperation of the person reporting the problem, hampering productivity for everyone involved.

With NPMD solutions, you can actively store raw Flow data that allows you to go back in time to replay a flow and watch the transport service across the network for forensic analysis. There's no need to wait for the issue to happen again "in the wild," or to attempt to recreate it manually. You already have a recording of the flow or flows in question. (Tip: be sure to use solutions that don't average up the data. With some solutions, if you don't catch a problem soon enough, the data gets rolled up into minute reports, which skews the data and often make it unusable for forensic analysis.)

These are just a few examples of the many types of network performance problems NetOps teams experience every day. As you can see, if you're equipped with the right network management tools and in-depth insights, these issues can be identified, analyzed and resolved much more quickly.

Hot Topics

The Latest

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...

Solving 3 Painful Network Performance Issues with NPMD Solutions

Jay Botelho

Network performance issues come in all shapes and sizes, and can require vast amounts of time and resources to solve. As a matter of fact, in my last column, I explored recent survey data that shows 42 percent of IT teams feel they spend too much time troubleshooting the network. In addition, 38 percent feel they can't proactively identify performance issues and 35 percent have poor visibility across the entire network. Regardless of these challenges, network operations (NetOps) teams still need to push forward and do everything in their power to correct problems before they impact end user experiences and the proverbial bottom line.

Here are three examples of painful network performance issues you're likely to encounter this year, and how Network Performance Monitoring and Diagnostic (NPMD) solutions can help you overcome them:

1. Horrible VoIP / Unified Communications Interruptions

Picture this: A multinational pharmaceutical company with widely-distributed development, operations, and manufacturing recently installed an extensive (and expensive) telepresence solution. It enables global collaboration and helps the company bring products to market more quickly by leveraging the most talented employees, regardless of their location. But unfortunately, the quality is poor, resulting in team members constantly saying, "Why is the meeting quality we just experienced so bad? Didn't we just spend millions on this system? It's so frustrating."

In most cases, poor performance can be traced to Quality of Service (QoS) mis-configurations. And this becomes ever more likely in a highly-distributed network where traffic flows through multiple network devices, all which must be properly configured. With today's modern NPMD solutions you can reduce configuration errors with easy-to-apply, rules-based QoS policies and templates. The ability to save, backup, and deploy automatically-scheduled configuration changes means policies are consistent and accurate across the entire network. As policies are implemented, real-time performance reports can quickly identify errors for immediate remediation. Many traditional NPMD solutions lack the end-to-end visibility of the new next-gen platforms, which allow NetOps to resolve QoS issues impacting UC performance across complex networks … and eliminate employee complaints about QoS for good.

2. The Dreaded "Poor Performance" Report

Imagine that a handful of Tier 1 support engineers at a global network equipment manufacturer with distributed "follow-the-sun" technical support centers are reporting problems when using the online support software. The engineers only experience this problem occasionally, oftentimes making it all the way through the call, but sometimes experiencing long delays (10-20 seconds) per entry into the system. This is creating poor customer experiences and generating a needless increase in support escalations. The problem is not specific to a location, and a number of users have experienced the occasional slow-down.

Intermittent problems like this can be some of the most challenging and time-consuming for IT to track down. But not if your teams are using NPMD solutions. With enterprise-wide topology maps and the ability to set alerts for (in this case) application and network latency on the application in question, network engineers can quickly see who is experiencing the problem, when they are experiencing it, and the general conditions during which the problem arises. By comparing application and network latency measurements, NetOps can see the network is responding quickly, but at times, the application is not.

Assuming the network has been configured with some sort of packet capture appliance in at least one location where problems are being experienced, the network engineer can then drill into the network packets themselves, all the way into the payload, to see the specific application calls are made when the delays happen, and any errors reported as a result. With this level of detailed information in hand, NetOps is armed with the evidence they need to approach the application team and quickly address the problem.

3. Wait - What Just Happened? We Need Instant Replay!

In just about every case, you hear about a network issue after it's happened. That usually leaves you with two less-than-ideal choices. The first is to wait for it to happen again. Depending on the severity of the problem, that may not even be an option and even if it is, it just about always comes with some level of business impact. The second is to work to actively reproduce the problem. This is often very time consuming, and sometimes requires the time and cooperation of the person reporting the problem, hampering productivity for everyone involved.

With NPMD solutions, you can actively store raw Flow data that allows you to go back in time to replay a flow and watch the transport service across the network for forensic analysis. There's no need to wait for the issue to happen again "in the wild," or to attempt to recreate it manually. You already have a recording of the flow or flows in question. (Tip: be sure to use solutions that don't average up the data. With some solutions, if you don't catch a problem soon enough, the data gets rolled up into minute reports, which skews the data and often make it unusable for forensic analysis.)

These are just a few examples of the many types of network performance problems NetOps teams experience every day. As you can see, if you're equipped with the right network management tools and in-depth insights, these issues can be identified, analyzed and resolved much more quickly.

Hot Topics

The Latest

Edge AI is strategically embedded in core IT and infrastructure spending across industries, according to the 2026 Edge AI Survey from ZEDEDA. The research shows that 83% of C-suite and IT executive respondents say edge AI is important to their core business strategy ...

As AI adoption accelerates, operational complexity — not model intelligence — is becoming the primary barrier to reliable AI at scale, according to the State of AI Engineering 2026 from Datadog ... The report highlights a compounding complexity challenge as AI systems scale ... Around 5% of AI model requests fail in production, with nearly 60% of those failures caused by capacity limits ...

For years, production operations teams have treated alert fatigue as a quality-of-life problem: something that makes on-call rotations miserable but isn't considered a direct contributor to outages. That framing doesn't capture how these systems fail, and we now have data to show why. More importantly, it's now clear alert fatigue is a symptom of a deeper issue: production systems have outgrown the current operational approaches ...

I was on a customer call last fall when an enterprise architect said something I haven't been able to shake. Her team had just spent four months trying to swap one AI vendor for another. The original plan said three weeks. "We didn't switch vendors," she told me. "We rebuilt half our integrations and discovered what we'd actually been depending on." Most enterprise leaders don't expect that to be the experience ...

Ask any senior SRE or platform engineer what keeps them up at night, and the answer probably isn't the monitoring tool — it's the data feeding it. The proliferation of APM, observability, and AIOps platforms has created a telemetry sprawl problem that most teams manage reactively rather than architect proactively. Metrics are going to one platform. Traces routed somewhere else. Logs duplicated across multiple backends because nobody wants to be caught without them when something breaks. Every redundant stream costs money ...