5 Best Practices for Effective Network Monitoring
November 21, 2022

Jay Botelho
LiveAction

Share this

Network monitoring is becoming more complex as the shift to remote work continues and cloud migration is more commonplace. Today's networks extend from core to edge to cloud, making network visibility crucial to ensuring performance and resolving issues quickly. But according to new research from EMA, only 27% of enterprises believe their network operations teams are being successful (which has been decreasing since 2016 when the number was 49%). From staffing issues to ineffective cloud strategies, NetOps teams are looking at how to streamline processes, consolidate tools, and improve network monitoring.

What are some best practices that can help achieve this? Let's dive into five.

1. The Right Data, Data, Data …

To achieve complete network visibility, NetOps teams must collect the correct networking data – and the more, the merrier. But no single data source can provide complete visibility. Each data type brings something unique to the table. Consequently, many organizations adopt various specialized networking tools to access them. Not only does this create productivity challenges from a workflow standpoint (resulting in further network blind spots), but it is also costly in terms of licensing, support, specialized training, etc. Luckily, some advanced network monitoring solutions offer consolidated functionality, enabling NetOps teams to see into the dark corners of each domain with the same dashboard, and better manage, optimize, and troubleshoot their hybrid networks.

What data types should you monitor? Here's the hit list:

■ SNMP allows you to identify and monitor the status of devices and network interfaces, including CPU utilization, memory usage, thermal conditions, bandwidth, and many other performance metrics.

■ Flow Data collects and summarizes IP traffic to reveal trends in network health over time and point to where events or network saturation occurs. Flow Data comes in many forms, from basic information extracted from the packet header to detailed application information, like that included in NBAR2. Just keep in mind that not all Flow Data is created equal.

■ Packet Data allows you to see the details behind the flow data and point to the root cause.

■ API Data monitors transactions during API calls to detect application latency, slow response times, or availability issues when accessing an application.

2. Have a Data Retention Policy

Not all problems are immediately identified or reported, so successful network monitoring strategies include a recourse plan to provide an audit trail for investigating issues after the fact. A data retention strategy usually addresses factors such as how long to retain different data types, the granularity of the data, and storage formats and location.

For flow and SNMP data, the answers are similar. Of course, you want to retain data for as long as possible, and for flow and SNMP, the retention times are typically measured in months and possibly even longer. The overall retention time is simply a matter of how much storage you are willing to commit to. Still, reasonable storage commitments (tens of terabytes) can easily provide months of storage, depending on the number of devices collecting data. One way to extend that time is to time-average the data. For example, taking data that are currently at one-minute granularity and averaging them to one-hour granularity, effectively turning 60 records into one. The choice to do this should be configurable and will be a personal choice based on the type of long-term reporting you hope to accomplish.

The data format will likely be dependent on the solution. Still, all solutions do their best to keep individual records as short as possible and use other techniques like compression to increase efficiency. Long-term storage will always be on fixed media, either hard disk drives (HDDs) or solid-state drives (SSDs). SSDs are more expensive but provide better response times when running long-term reports. Short-term reporting may rely on data in memory (RAM) for performance, but eventually, all data is moved to fixed media.

Packet storage is a different story. Even with hundreds of terabytes of storage on a high-speed network (20+ Gbps), you are likely to get days of packet storage at best. Since you never know which packets might be needed in analysis, there is no way to sample the data or do time-averaging like with flow data records. Compression is the best that can be done, but compression is only marginally helpful due to the built-in density of packet data.

Two techniques that will help are filtering out the packet data you are sure you'll never analyze, like backup data, and storing packet payloads when they are unencrypted. Most network traffic is encrypted nowadays, and if you do not have the keys, storing the packet payloads is not good. Look for a solution that does this slicing automatically, based on protocol. Packet storage will be entirely on fixed media and given the amount of storage typically required for any meaningful length of time, HDDs are still the only cost-effective option.

3. Keep a Network Map with a Device Inventory

It's crucial to eliminate visibility gaps, and every switch, router, port and endpoint must be virtually located and observed live for health and performance issues. While this sort of network inventory mapping can be an arduous manual task, device auto-discovery tools in many network monitoring software platforms create these lists for you. Without it, there is no way to map what the network looks like, nor is there a way to visualize the utilization of the network in a way that is intuitive to a network engineer. Network inventory mapping provides the basis upon which flow data is overlayed. Without such a map, it would be like drawing a straight line between San Francisco and Boston and claiming, "that's the route I'm taking to drive across the country," with absolutely no detail in between.

Pro-tip, when considering network monitoring tools, inquire if they include a device management system (DMS) so you can easily configure, monitor, or reset devices remotely. This will assist in more efficient and streamlined management. Many independent products on the market perform this function, but it is far more efficient when this capability is integrated into your overall network management solution.

4. Create a Detailed Escalation Plan

Escalation plans often involve alert prioritization or threat scoring, so alerts falling in the range of different thresholds go to the right predetermined contacts, typically shared between network engineers, application engineers, and security team members. This helps critical issues like unexpected traffic surges or anomalous IoT behavior get immediate attention. More benign problems, like down-rev devices or slight increases in latency can filter into an investigation queue with a longer response time.

A predetermined response plan keeps the organization from having one pool of overwhelming alerts to fish through, minimizes response delay, and creates accountability with the group or pod the alert is specifically assigned to. Much like the data retention policy, these plans will assist in mapping out processes and help with change management, crisis prevention, and more. 

5. Automate Wherever Possible

Successful network monitoring strategies focus on efficiency and fast reactions, automating where it makes sense. Automating critical tasks such as daily backups, applying security patches and software updates, restarting failed devices, or running weekly reports can free up engineering resources for optimizing network flow paths and planning for future initiatives. Automation not only assists in saving resources but also opens space for your team to put more time into planning, strategy, and leveling up your process as your company evolves.

And automation is not limited to a single system or solution. Some of the most critical automation happens between products. Examples include when the network monitoring system automatically creates tickets in the service management system, or the Security Information and Event Management (SIEM) is in direct communication with the network management solution to initiate packet recording in response to a high-priority security alert. Many products are capable of this level of automation, but you typically must ask and verify how much of it is truly automated and how much you must script yourself.

These are just a few simple network monitoring best practices that should help streamline NetOps and ensure better visibility across the network.

Jay Botelho is Senior Director of Product Management at LiveAction
Share this

The Latest

December 01, 2022

You could argue that, until the pandemic, and the resulting shift to hybrid working, delivering flawless customer experiences and improving employee productivity were mutually exclusive activities. Evidence from Catchpoint's recently published Site Reliability Engineering (SRE) industry report suggests this is changing ...

November 30, 2022

There are many issues that can contribute to developer dissatisfaction on the job — inadequate pay and work-life imbalance, for example. But increasingly there's also a troubling and growing sense of lacking ownership and feeling out of control ... One key way to increase job satisfaction is to ameliorate this sense of ownership and control whenever possible, and approaches to observability offer several ways to do this ...

November 29, 2022

The need for real-time, reliable data is increasing, and that data is a necessity to remain competitive in today's business landscape. At the same time, observability has become even more critical with the complexity of a hybrid multi-cloud environment. To add to the challenges and complexity, the term "observability" has not been clearly defined ...

November 28, 2022

Many have assumed that the mainframe is a dying entity, but instead, a mainframe renaissance is underway. Despite this notion, we are ushering in a future of more strategic investments, increased capacity, and leading innovations ...

November 22, 2022

Most (85%) consumers shop online or via a mobile app, with 59% using these digital channels as their primary holiday shopping channel, according to the Black Friday Consumer Report from Perforce Software. As brands head into a highly profitable time of year, starting with Black Friday and Cyber Monday, it's imperative development teams prepare for peak traffic, optimal channel performance, and seamless user experiences to retain and attract shoppers ...

November 21, 2022

From staffing issues to ineffective cloud strategies, NetOps teams are looking at how to streamline processes, consolidate tools, and improve network monitoring. What are some best practices that can help achieve this? Let's dive into five ...

November 18, 2022

On November 1, Taylor Swift announced the Eras Tour ... the whole world is now standing in the same virtual queue, and even the most durable cloud architecture can't handle this level of deluge ...

November 17, 2022

OpenTelemetry, a collaborative open source observability project, has introduced a new network protocol that addresses the infrastructure management headache, coupled with collector configuration options to filter and reduce data volume ...

November 16, 2022

A unified view of digital infrastructure is essential for IT teams that must improve the digital user experience while boosting overall organizational productivity, according to a survey of IT managers in the United Arab Emirates (UAE), from Riverbed and market research firm IDC ...

November 15, 2022

Building the visibility infrastructure to make cloud networks observable is a complex technical challenge. But with careful planning and a few strategic decisions, it's possible to appropriately design, set up and manage visibility solutions for the cloud ...