The Importance of Network Observability for Tech Companies
November 15, 2022

Nadeem Zahid
cPacket Networks

Share this

Tech companies tend to be the earliest adopters of IT and digital transformation trends, for obvious reasons. These companies have already embraced a cloud-first mentality, and are well in to migrating business-critical workloads to the cloud. However, that "tip of the spear" position in regard to cloud adoption puts these companies at considerable risk of losing visibility into application workloads, leaving them to struggle to detect performance issues and potential threats.

The challenge is that cloud monitoring and visibility is hard, especially for public clouds, which tend to be a black box when it comes to observability. This balancing act between enthusiastic cloud adoption and consistent and complete visibility is crucial for big tech to get right, for two reasons.

First, the heavy reliance on SaaS-based apps (both as a product offering and for internal usage) and cloud data means that IT teams must maintain network performance and rapidly troubleshoot in hybrid cloud environments. A few seconds (or even milliseconds) of performance latency can lead to frustrated employees and customers.

Second, tech companies are prime targets for attackers. The financial and reputational damage of a security breach, especially for high-value targets such as large fintech companies, can easily ruin a company's image and operation. Security teams need both a real-time, reliable feed of packet data for their NDR and firewall tools, and a store of packet data going back weeks for forensic investigations.

Building the visibility infrastructure to make these cloud networks observable is a complex technical challenge. But with careful planning and a few strategic decisions, it's possible to appropriately design, set up and manage visibility solutions for the cloud.

Observability Challenges for Security and NPM

One of the key mandates for IT teams is ensuring consistency, making network performance monitoring (NPM) a high priority. If there's a problem, IT needs the ability to quickly trace it to a specific application, then onto specific nodes or parts of the public/private cloud infrastructure to solve the problem.

If the cloud provider is at fault, then IT will need detailed packet data to prove an SLA is being violated. Without that data, the troubleshooting can quickly devolve into useless finger pointing. (Yet turning on a cloud provider's built-in traffic mirroring and then investigating performance issues will take weeks.) To be useful, visibility must be in place before the issue arises.

Unfortunately, you can't just throw a switch to get access to packet data through traffic mirroring. In particular, managing the "fire hose" of cloud data in real-time for these mirroring scenarios is technically challenging.

Security is the other side of the observability coin and (at the risk of stretching the metaphor to breaking) it has two sides. The first is getting access to real-time packet data; this is similar to the performance monitoring challenge above, but with unique nuances. The second issue is the ability to save packets for forensic investigation.

For security purposes, real-time packet data feeds must go to security tools like NDR and firewalls. Not missing any of these packets is crucial; for cloud this makes an inline packet solution ideal. That said, security tools can often only ingest packets at 10G speeds, so faster connections will require a packet broker that can handle both 10G and 40/100G traffic. In terms of the packets themselves, traffic that is traversing environments, either between an application and the open internet or between the data center and the cloud, is often of particular interest to security teams as these can be likely entry points for an intruder. Unfortunately, this traffic can be particularly difficult to monitor.

For forensic analysis, security team investigations will require packet data that covers days or weeks of traffic between critical nodes. This means observability plans need to cover not just packet access, but capture and storage as well.

When setting up the monitoring infrastructure, several factors must also be weighed. At a basic level, the brokers, taps, capture devices, etc. all take up valuable rack space; consolidation, density and adequate topology planning are all critical. If data that's being monitored is sensitive or subject to privacy regulations, access to the visibility system and data must be controlled. The monitoring itself also creates a technical load on the network that must be accounted for (you don't want the monitoring itself to be the cause of performance issues).

Bridging the Visibility Gap

The appropriate monitoring infrastructure should be built around a subnet comprised of a load balancer, virtual packet broker and storage appliance, with equipment placed throughout the network at key points. One strategy to conserve space, save money and maximize resources is to use brokers as the "power strip" that distributes packets to firewalls and other security or NPM tools at the correct speeds. The subnet can further connect packet capture and storage to forensic tools for investigation, and feed NPM tools with real-time data to quickly triangulate network issues like latency, allowing IT teams to determine fault and, if necessary, negotiate with the cloud provider.

As mentioned, access to packet data in the public cloud is particularly difficult. The hyperscale providers all recognize the problems this lack of visibility causes, and each have taken different paths to solving it. AWS and GCP use similar mirroring approaches (VPC traffic (AWS) or packet (GCP) mirroring service). In basic terms, this traffic/packet mirroring duplicates network traffic to and from the client's applications and forwards it to cloud-native performance and security monitoring tool sets for assessment, and to capture devices for later analysis. This eliminates the need to deploy ad-hoc forwarding agents or sensors in each VPC instance for every monitoring tool. The raw data itself is not ready for analysis, and requires a virtual or cloud packet broker to ensure the right data gets to the right monitoring or security tools. That said, combining these mirroring options with virtual packet brokers can ultimately reduce cost, as a single stream only has to be mirrored once for the broker (as opposed to once per each NPM or security tool).

Solving the visibility challenge with Azure is different, and requires using what's known as "inline mode" on certain virtual packet brokers. This allows the packet broker itself to monitor subnet ingress and egress traffic to capture, pre-process, and deliver packet data in real-time to security, performance management, analytics, capture and other solutions.

Developing this visibility topology is complex; many companies may not have the necessary in-house staff to handle it, and may need to work with service providers or vendors on the design and set-up. But whether handled in-house or outsourced, keep tool and infrastructure sprawl in mind: a mixture of virtual and physical devices can save rack space in data centers, and leveraging the cloud for a consolidated management view of all packet broker and capture solutions can save considerable time.

Tech companies often take the slings and arrows that come with early adoption. But paying careful attention to visibility and monitoring allows organizations to better weather these issues by staying on-top of threats and ensuring the network is operating according to plan.

Nadeem Zahid is VP of Product Management & Marketing at cPacket Networks
Share this

The Latest

March 30, 2023

APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 2 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...

March 29, 2023

Most organizations suffer from some form of alert noise. Alert noise is only going to increase as organizations support cloud-native applications spanning multiple public and private clouds, including ephemeral deployments and more. It's not going to get easier for organizations to understand the signal from all those alerts being sent. So what can be done about it? ...

March 28, 2023

This blog presents the case for a radical new approach to basic information technology (IT) education. This conclusion is based on a study of courses and other forms of IT education which purport to cover IT "fundamentals" ...

March 27, 2023

To achieve maximum availability, IT leaders must employ domain-agnostic solutions that identify and escalate issues across all telemetry points. These technologies, which we refer to as Artificial Intelligence for IT Operations, create convergence — in other words, they provide IT and DevOps teams with the full picture of event management and downtime ...

March 23, 2023

APMdigest and leading IT research firm Enterprise Management Associates (EMA) are partnering to bring you the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 2 - Part 1 Pete Goldin, Editor and Publisher of APMdigest, discusses Network Observability with Shamus McGillicuddy, Vice President of Research, Network Infrastructure and Operations, at EMA ...

March 22, 2023

CIOs have stepped into the role of digital leader and strategic advisor, according to the 2023 Global CIO Survey from Logicalis ...

March 21, 2023

Synthetic monitoring is crucial to deploy code with confidence as catching bugs with E2E tests on staging is becoming increasingly difficult. It isn't trivial to provide realistic staging systems, especially because today's apps are intertwined with many third-party APIs ...

March 20, 2023

Recent EMA field research found that ServiceOps is either an active effort or a formal initiative in 78% of the organizations represented by a global panel of 400+ IT leaders. It is relatively early but gaining momentum across industries and organizations of all sizes globally ...

March 16, 2023

Managing availability and performance within SAP environments has long been a challenge for IT teams. But as IT environments grow more complex and dynamic, and the speed of innovation in almost every industry continues to accelerate, this situation is becoming a whole lot worse ...

March 15, 2023

Harnessing the power of network-derived intelligence and insights is critical in detecting today's increasingly sophisticated security threats across hybrid and multi-cloud infrastructure, according to a new research study from IDC ...