5 Best Practices for Effective Network Monitoring
November 21, 2022

Jay Botelho
LiveAction

Share this

Network monitoring is becoming more complex as the shift to remote work continues and cloud migration is more commonplace. Today's networks extend from core to edge to cloud, making network visibility crucial to ensuring performance and resolving issues quickly. But according to new research from EMA, only 27% of enterprises believe their network operations teams are being successful (which has been decreasing since 2016 when the number was 49%). From staffing issues to ineffective cloud strategies, NetOps teams are looking at how to streamline processes, consolidate tools, and improve network monitoring.

What are some best practices that can help achieve this? Let's dive into five.

1. The Right Data, Data, Data …

To achieve complete network visibility, NetOps teams must collect the correct networking data – and the more, the merrier. But no single data source can provide complete visibility. Each data type brings something unique to the table. Consequently, many organizations adopt various specialized networking tools to access them. Not only does this create productivity challenges from a workflow standpoint (resulting in further network blind spots), but it is also costly in terms of licensing, support, specialized training, etc. Luckily, some advanced network monitoring solutions offer consolidated functionality, enabling NetOps teams to see into the dark corners of each domain with the same dashboard, and better manage, optimize, and troubleshoot their hybrid networks.

What data types should you monitor? Here's the hit list:

■ SNMP allows you to identify and monitor the status of devices and network interfaces, including CPU utilization, memory usage, thermal conditions, bandwidth, and many other performance metrics.

■ Flow Data collects and summarizes IP traffic to reveal trends in network health over time and point to where events or network saturation occurs. Flow Data comes in many forms, from basic information extracted from the packet header to detailed application information, like that included in NBAR2. Just keep in mind that not all Flow Data is created equal.

■ Packet Data allows you to see the details behind the flow data and point to the root cause.

■ API Data monitors transactions during API calls to detect application latency, slow response times, or availability issues when accessing an application.

2. Have a Data Retention Policy

Not all problems are immediately identified or reported, so successful network monitoring strategies include a recourse plan to provide an audit trail for investigating issues after the fact. A data retention strategy usually addresses factors such as how long to retain different data types, the granularity of the data, and storage formats and location.

For flow and SNMP data, the answers are similar. Of course, you want to retain data for as long as possible, and for flow and SNMP, the retention times are typically measured in months and possibly even longer. The overall retention time is simply a matter of how much storage you are willing to commit to. Still, reasonable storage commitments (tens of terabytes) can easily provide months of storage, depending on the number of devices collecting data. One way to extend that time is to time-average the data. For example, taking data that are currently at one-minute granularity and averaging them to one-hour granularity, effectively turning 60 records into one. The choice to do this should be configurable and will be a personal choice based on the type of long-term reporting you hope to accomplish.

The data format will likely be dependent on the solution. Still, all solutions do their best to keep individual records as short as possible and use other techniques like compression to increase efficiency. Long-term storage will always be on fixed media, either hard disk drives (HDDs) or solid-state drives (SSDs). SSDs are more expensive but provide better response times when running long-term reports. Short-term reporting may rely on data in memory (RAM) for performance, but eventually, all data is moved to fixed media.

Packet storage is a different story. Even with hundreds of terabytes of storage on a high-speed network (20+ Gbps), you are likely to get days of packet storage at best. Since you never know which packets might be needed in analysis, there is no way to sample the data or do time-averaging like with flow data records. Compression is the best that can be done, but compression is only marginally helpful due to the built-in density of packet data.

Two techniques that will help are filtering out the packet data you are sure you'll never analyze, like backup data, and storing packet payloads when they are unencrypted. Most network traffic is encrypted nowadays, and if you do not have the keys, storing the packet payloads is not good. Look for a solution that does this slicing automatically, based on protocol. Packet storage will be entirely on fixed media and given the amount of storage typically required for any meaningful length of time, HDDs are still the only cost-effective option.

3. Keep a Network Map with a Device Inventory

It's crucial to eliminate visibility gaps, and every switch, router, port and endpoint must be virtually located and observed live for health and performance issues. While this sort of network inventory mapping can be an arduous manual task, device auto-discovery tools in many network monitoring software platforms create these lists for you. Without it, there is no way to map what the network looks like, nor is there a way to visualize the utilization of the network in a way that is intuitive to a network engineer. Network inventory mapping provides the basis upon which flow data is overlayed. Without such a map, it would be like drawing a straight line between San Francisco and Boston and claiming, "that's the route I'm taking to drive across the country," with absolutely no detail in between.

Pro-tip, when considering network monitoring tools, inquire if they include a device management system (DMS) so you can easily configure, monitor, or reset devices remotely. This will assist in more efficient and streamlined management. Many independent products on the market perform this function, but it is far more efficient when this capability is integrated into your overall network management solution.

4. Create a Detailed Escalation Plan

Escalation plans often involve alert prioritization or threat scoring, so alerts falling in the range of different thresholds go to the right predetermined contacts, typically shared between network engineers, application engineers, and security team members. This helps critical issues like unexpected traffic surges or anomalous IoT behavior get immediate attention. More benign problems, like down-rev devices or slight increases in latency can filter into an investigation queue with a longer response time.

A predetermined response plan keeps the organization from having one pool of overwhelming alerts to fish through, minimizes response delay, and creates accountability with the group or pod the alert is specifically assigned to. Much like the data retention policy, these plans will assist in mapping out processes and help with change management, crisis prevention, and more. 

5. Automate Wherever Possible

Successful network monitoring strategies focus on efficiency and fast reactions, automating where it makes sense. Automating critical tasks such as daily backups, applying security patches and software updates, restarting failed devices, or running weekly reports can free up engineering resources for optimizing network flow paths and planning for future initiatives. Automation not only assists in saving resources but also opens space for your team to put more time into planning, strategy, and leveling up your process as your company evolves.

And automation is not limited to a single system or solution. Some of the most critical automation happens between products. Examples include when the network monitoring system automatically creates tickets in the service management system, or the Security Information and Event Management (SIEM) is in direct communication with the network management solution to initiate packet recording in response to a high-priority security alert. Many products are capable of this level of automation, but you typically must ask and verify how much of it is truly automated and how much you must script yourself.

These are just a few simple network monitoring best practices that should help streamline NetOps and ensure better visibility across the network.

Jay Botelho is Senior Director of Product Management at LiveAction
Share this

The Latest

March 01, 2024

As organizations continue to navigate the complexities of the digital era, which has been marked by exponential advancements in AI and technology, the strategic deployment of modern, practical applications has become indispensable for sustaining competitive advantage and realizing business goals. The Info-Tech Research Group report, Applications Priorities 2024, explores the following five initiatives for emerging and leading-edge technologies and practices that can enable IT and applications leaders to optimize their application portfolio and improve on capabilities needed to meet the ambitions of their organizations ...

February 29, 2024

Despite the growth in popularity of artificial intelligence (AI) and ML across a number of industries, there is still a huge amount of unrealized potential, with many businesses playing catch-up and still planning how ML solutions can best facilitate processes. Further progression could be limited without investment in specialized technical teams to drive development and integration ...

February 28, 2024

With over 200 streaming services to choose from, including multiple platforms featuring similar types of entertainment, users have little incentive to remain loyal to any given platform if it exhibits performance issues. Big names in streaming like Hulu, Amazon Prime and HBO Max invest thousands of hours into engineering observability and closed-loop monitoring to combat infrastructure and application issues, but smaller platforms struggle to remain competitive without access to the same resources ...

February 27, 2024

Generative AI has recently experienced unprecedented dramatic growth, making it one of the most exciting transformations the tech industry has seen in some time. However, this growth also poses a challenge for tech leaders who will be expected to deliver on the promise of new technology. In 2024, delivering tangible outcomes that meet the potential of AI, and setting up incubator projects for the future will be key tasks ...

February 26, 2024

SAP is a tool for automating business processes. Managing SAP solutions, especially with the shift to the cloud-based S/4HANA platform, can be intricate. To explore the concerns of SAP users during operational transformations and automation, a survey was conducted in mid-2023 by Digitate and Americas' SAP Users' Group ...

February 22, 2024

Some companies are just starting to dip their toes into developing AI capabilities, while (few) others can claim they have built a truly AI-first product. Regardless of where a company is on the AI journey, leaders must understand what it means to build every aspect of their product with AI in mind ...

February 21, 2024

Generative AI will usher in advantages within various industries. However, the technology is still nascent, and according to the recent Dynatrace survey there are many challenges and risks that organizations need to overcome to use this technology effectively ...

February 20, 2024

In today's digital era, monitoring and observability are indispensable in software and application development. Their efficacy lies in empowering developers to swiftly identify and address issues, enhance performance, and deliver flawless user experiences. Achieving these objectives requires meticulous planning, strategic implementation, and consistent ongoing maintenance. In this blog, we're sharing our five best practices to fortify your approach to application performance monitoring (APM) and observability ...

February 16, 2024

In MEAN TIME TO INSIGHT Episode 3, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses network security with Chris Steffen, VP of Research Covering Information Security, Risk, and Compliance Management at EMA ...

February 15, 2024

In a time where we're constantly bombarded with new buzzwords and technological advancements, it can be challenging for businesses to determine what is real, what is useful, and what they truly need. Over the years, we've witnessed the rise and fall of various tech trends, such as the promises (and fears) of AI becoming sentient and replacing humans to the declaration that data is the new oil. At the end of the day, one fundamental question remains: How can companies navigate through the tech buzz and make informed decisions for their future? ...