5 Best Practices for Effective Network Monitoring

November 21, 2022

Jay Botelho

Learn more about LiveAction

Network monitoring is becoming more complex as the shift to remote work continues and cloud migration is more commonplace. Today's networks extend from core to edge to cloud, making network visibility crucial to ensuring performance and resolving issues quickly. But according to new research from EMA, only 27% of enterprises believe their network operations teams are being successful (which has been decreasing since 2016 when the number was 49%). From staffing issues to ineffective cloud strategies, NetOps teams are looking at how to streamline processes, consolidate tools, and improve network monitoring.

What are some best practices that can help achieve this? Let's dive into five.

1. The Right Data, Data, Data …

To achieve complete network visibility, NetOps teams must collect the correct networking data – and the more, the merrier. But no single data source can provide complete visibility. Each data type brings something unique to the table. Consequently, many organizations adopt various specialized networking tools to access them. Not only does this create productivity challenges from a workflow standpoint (resulting in further network blind spots), but it is also costly in terms of licensing, support, specialized training, etc. Luckily, some advanced network monitoring solutions offer consolidated functionality, enabling NetOps teams to see into the dark corners of each domain with the same dashboard, and better manage, optimize, and troubleshoot their hybrid networks.

What data types should you monitor? Here's the hit list:

■ SNMP allows you to identify and monitor the status of devices and network interfaces, including CPU utilization, memory usage, thermal conditions, bandwidth, and many other performance metrics.

■ Flow Data collects and summarizes IP traffic to reveal trends in network health over time and point to where events or network saturation occurs. Flow Data comes in many forms, from basic information extracted from the packet header to detailed application information, like that included in NBAR2. Just keep in mind that not all Flow Data is created equal.

■ Packet Data allows you to see the details behind the flow data and point to the root cause.

■ API Data monitors transactions during API calls to detect application latency, slow response times, or availability issues when accessing an application.

2. Have a Data Retention Policy

Not all problems are immediately identified or reported, so successful network monitoring strategies include a recourse plan to provide an audit trail for investigating issues after the fact. A data retention strategy usually addresses factors such as how long to retain different data types, the granularity of the data, and storage formats and location.

For flow and SNMP data, the answers are similar. Of course, you want to retain data for as long as possible, and for flow and SNMP, the retention times are typically measured in months and possibly even longer. The overall retention time is simply a matter of how much storage you are willing to commit to. Still, reasonable storage commitments (tens of terabytes) can easily provide months of storage, depending on the number of devices collecting data. One way to extend that time is to time-average the data. For example, taking data that are currently at one-minute granularity and averaging them to one-hour granularity, effectively turning 60 records into one. The choice to do this should be configurable and will be a personal choice based on the type of long-term reporting you hope to accomplish.

The data format will likely be dependent on the solution. Still, all solutions do their best to keep individual records as short as possible and use other techniques like compression to increase efficiency. Long-term storage will always be on fixed media, either hard disk drives (HDDs) or solid-state drives (SSDs). SSDs are more expensive but provide better response times when running long-term reports. Short-term reporting may rely on data in memory (RAM) for performance, but eventually, all data is moved to fixed media.

Packet storage is a different story. Even with hundreds of terabytes of storage on a high-speed network (20+ Gbps), you are likely to get days of packet storage at best. Since you never know which packets might be needed in analysis, there is no way to sample the data or do time-averaging like with flow data records. Compression is the best that can be done, but compression is only marginally helpful due to the built-in density of packet data.

Two techniques that will help are filtering out the packet data you are sure you'll never analyze, like backup data, and storing packet payloads when they are unencrypted. Most network traffic is encrypted nowadays, and if you do not have the keys, storing the packet payloads is not good. Look for a solution that does this slicing automatically, based on protocol. Packet storage will be entirely on fixed media and given the amount of storage typically required for any meaningful length of time, HDDs are still the only cost-effective option.

3. Keep a Network Map with a Device Inventory

It's crucial to eliminate visibility gaps, and every switch, router, port and endpoint must be virtually located and observed live for health and performance issues. While this sort of network inventory mapping can be an arduous manual task, device auto-discovery tools in many network monitoring software platforms create these lists for you. Without it, there is no way to map what the network looks like, nor is there a way to visualize the utilization of the network in a way that is intuitive to a network engineer. Network inventory mapping provides the basis upon which flow data is overlayed. Without such a map, it would be like drawing a straight line between San Francisco and Boston and claiming, "that's the route I'm taking to drive across the country," with absolutely no detail in between.

Pro-tip, when considering network monitoring tools, inquire if they include a device management system (DMS) so you can easily configure, monitor, or reset devices remotely. This will assist in more efficient and streamlined management. Many independent products on the market perform this function, but it is far more efficient when this capability is integrated into your overall network management solution.

4. Create a Detailed Escalation Plan

Escalation plans often involve alert prioritization or threat scoring, so alerts falling in the range of different thresholds go to the right predetermined contacts, typically shared between network engineers, application engineers, and security team members. This helps critical issues like unexpected traffic surges or anomalous IoT behavior get immediate attention. More benign problems, like down-rev devices or slight increases in latency can filter into an investigation queue with a longer response time.

A predetermined response plan keeps the organization from having one pool of overwhelming alerts to fish through, minimizes response delay, and creates accountability with the group or pod the alert is specifically assigned to. Much like the data retention policy, these plans will assist in mapping out processes and help with change management, crisis prevention, and more.

5. Automate Wherever Possible

Successful network monitoring strategies focus on efficiency and fast reactions, automating where it makes sense. Automating critical tasks such as daily backups, applying security patches and software updates, restarting failed devices, or running weekly reports can free up engineering resources for optimizing network flow paths and planning for future initiatives. Automation not only assists in saving resources but also opens space for your team to put more time into planning, strategy, and leveling up your process as your company evolves.

And automation is not limited to a single system or solution. Some of the most critical automation happens between products. Examples include when the network monitoring system automatically creates tickets in the service management system, or the Security Information and Event Management (SIEM) is in direct communication with the network management solution to initiate packet recording in response to a high-priority security alert. Many products are capable of this level of automation, but you typically must ask and verify how much of it is truly automated and how much you must script yourself.

These are just a few simple network monitoring best practices that should help streamline NetOps and ensure better visibility across the network.

Hot Topics

Automation

Monitoring

NPM/NetOps

The Latest

AI Drives Surge in Data Budgets

May 21, 2025

AI is the catalyst for significant investment in data teams as enterprises require higher-quality data to power their AI applications, according to the State of Analytics Engineering Report from dbt Labs ...

Misaligned Architecture Causes Service Disruptions, High Operational Costs and Security Challenges

May 20, 2025

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

How GenAI Can Save Time for the NetOps Team

May 19, 2025

A Gartner analyst recently suggested that GenAI tools could create 25% time savings for network operational teams. Where might these time savings come from? How are GenAI tools helping NetOps teams today, and what other tasks might they take on in the future as models continue improving? In general, these savings come from automating or streamlining manual NetOps tasks ...

Will AI Solve the Growing Data Divide?

May 16, 2025

IT and line-of-business teams are increasingly aligned in their efforts to close the data gap and drive greater collaboration to alleviate IT bottlenecks and offload growing demands on IT teams, according to The 2025 Automation Benchmark Report: Insights from IT Leaders on Enterprise Automation & the Future of AI-Driven Businesses from Jitterbit ...

Top Concerns for Tech Decision Makers

May 15, 2025

A large majority (86%) of data management and AI decision makers cite protecting data privacy as a top concern, with 76% of respondents citing ROI on data privacy and AI initiatives across their organization, according to a new Harris Poll from Collibra ...

Gartner: Top Trends Shaping the Future of Cloud

May 14, 2025

According to Gartner, Inc. the following six trends will shape the future of cloud over the next four years, ultimately resulting in new ways of working that are digital in nature and transformative in impact ...

The Great SaaS Hangover (and the Cure Nobody Is Talking About)

May 13, 2025

2020 was the equivalent of a wedding with a top-shelf open bar. As businesses scrambled to adjust to remote work, digital transformation accelerated at breakneck speed. New software categories emerged overnight. Tech stacks ballooned with all sorts of SaaS apps solving ALL the problems — often with little oversight or long-term integration planning, and yes frequently a lot of duplicated functionality ... But now the music's faded. The lights are on. Everyone from the CIO to the CFO is checking the bill. Welcome to the Great SaaS Hangover ...

OpenShift Monitoring: 5 Things You Need to Keep an Eye on

May 12, 2025

Regardless of OpenShift being a scalable and flexible software, it can be a pain to monitor since complete visibility into the underlying operations is not guaranteed ... To effectively monitor an OpenShift environment, IT administrators should focus on these five key elements and their associated metrics ...

AI Drives New Wave of Digital Transformation

May 09, 2025

An overwhelming majority of IT leaders (95%) believe the upcoming wave of AI-powered digital transformation is set to be the most impactful and intensive seen thus far, according to The Science of Productivity: AI, Adoption, And Employee Experience, a new report from Nexthink ...

Data Center Outage Frequency Decreasing

May 08, 2025

Overall outage frequency and the general level of reported severity continue to decline, according to the Outage Analysis 2025 from Uptime Institute. However, cyber security incidents are on the rise and often have severe, lasting impacts ...

5 Best Practices for Effective Network Monitoring

November 21, 2022

Jay Botelho

Learn more about LiveAction

What are some best practices that can help achieve this? Let's dive into five.

1. The Right Data, Data, Data …

What data types should you monitor? Here's the hit list:

■ SNMP allows you to identify and monitor the status of devices and network interfaces, including CPU utilization, memory usage, thermal conditions, bandwidth, and many other performance metrics.

■ Packet Data allows you to see the details behind the flow data and point to the root cause.

■ API Data monitors transactions during API calls to detect application latency, slow response times, or availability issues when accessing an application.

2. Have a Data Retention Policy

3. Keep a Network Map with a Device Inventory

4. Create a Detailed Escalation Plan

5. Automate Wherever Possible

These are just a few simple network monitoring best practices that should help streamline NetOps and ensure better visibility across the network.

Hot Topics

Automation

Monitoring

NPM/NetOps

The Latest

AI Drives Surge in Data Budgets

May 21, 2025

Misaligned Architecture Causes Service Disruptions, High Operational Costs and Security Challenges

May 20, 2025

Misaligned architecture can lead to business consequences, with 93% of respondents reporting negative outcomes such as service disruptions, high operational costs and security challenges ...

How GenAI Can Save Time for the NetOps Team

May 19, 2025

Will AI Solve the Growing Data Divide?

May 16, 2025

Top Concerns for Tech Decision Makers

May 15, 2025

Gartner: Top Trends Shaping the Future of Cloud

May 14, 2025

The Great SaaS Hangover (and the Cure Nobody Is Talking About)

May 13, 2025

OpenShift Monitoring: 5 Things You Need to Keep an Eye on

May 12, 2025

AI Drives New Wave of Digital Transformation

May 09, 2025

Data Center Outage Frequency Decreasing

May 08, 2025

Featured eBook

Featured Webinar

Featured Free Trial

Featured Report

Featured Webinar

Featured White Paper

Featured eBook

Featured White Paper

Featured Webinar

Featured Free Trial

Featured eBook

Featured Free Trial

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Tool

Featured eBook

Featured Webinar

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Report

Featured White Paper

Featured White Paper

Featured White Paper

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Free Trial

Featured Free Trial

Featured White Paper

Featured White Paper

Featured Webinar

Featured eBook

Featured White Paper

Featured eBook

Featured Webinar

Featured Free Tool

Featured Free Tool

Featured Webinar

Featured White Paper

Featured Report

Featured White Paper

Featured Webinar

Featured Report

Featured Report

Featured Webinar

Featured White Paper

Featured White Paper

Featured White Paper

Featured Webinar

Featured Webinar

Featured Webinar

Featured White Paper

Featured Free Trial

Featured Webinar

Featured Webinar

Featured Free Trial

Featured Webinar

Featured Report

Featured Webinar

Featured White Paper

Featured Webinar

Featured White Paper

Featured Free Trial