The Importance of Network Observability for Tech Companies
November 15, 2022

Nadeem Zahid
cPacket Networks

Share this

Tech companies tend to be the earliest adopters of IT and digital transformation trends, for obvious reasons. These companies have already embraced a cloud-first mentality, and are well in to migrating business-critical workloads to the cloud. However, that "tip of the spear" position in regard to cloud adoption puts these companies at considerable risk of losing visibility into application workloads, leaving them to struggle to detect performance issues and potential threats.

The challenge is that cloud monitoring and visibility is hard, especially for public clouds, which tend to be a black box when it comes to observability. This balancing act between enthusiastic cloud adoption and consistent and complete visibility is crucial for big tech to get right, for two reasons.

First, the heavy reliance on SaaS-based apps (both as a product offering and for internal usage) and cloud data means that IT teams must maintain network performance and rapidly troubleshoot in hybrid cloud environments. A few seconds (or even milliseconds) of performance latency can lead to frustrated employees and customers.

Second, tech companies are prime targets for attackers. The financial and reputational damage of a security breach, especially for high-value targets such as large fintech companies, can easily ruin a company's image and operation. Security teams need both a real-time, reliable feed of packet data for their NDR and firewall tools, and a store of packet data going back weeks for forensic investigations.

Building the visibility infrastructure to make these cloud networks observable is a complex technical challenge. But with careful planning and a few strategic decisions, it's possible to appropriately design, set up and manage visibility solutions for the cloud.

Observability Challenges for Security and NPM

One of the key mandates for IT teams is ensuring consistency, making network performance monitoring (NPM) a high priority. If there's a problem, IT needs the ability to quickly trace it to a specific application, then onto specific nodes or parts of the public/private cloud infrastructure to solve the problem.

If the cloud provider is at fault, then IT will need detailed packet data to prove an SLA is being violated. Without that data, the troubleshooting can quickly devolve into useless finger pointing. (Yet turning on a cloud provider's built-in traffic mirroring and then investigating performance issues will take weeks.) To be useful, visibility must be in place before the issue arises.

Unfortunately, you can't just throw a switch to get access to packet data through traffic mirroring. In particular, managing the "fire hose" of cloud data in real-time for these mirroring scenarios is technically challenging.

Security is the other side of the observability coin and (at the risk of stretching the metaphor to breaking) it has two sides. The first is getting access to real-time packet data; this is similar to the performance monitoring challenge above, but with unique nuances. The second issue is the ability to save packets for forensic investigation.

For security purposes, real-time packet data feeds must go to security tools like NDR and firewalls. Not missing any of these packets is crucial; for cloud this makes an inline packet solution ideal. That said, security tools can often only ingest packets at 10G speeds, so faster connections will require a packet broker that can handle both 10G and 40/100G traffic. In terms of the packets themselves, traffic that is traversing environments, either between an application and the open internet or between the data center and the cloud, is often of particular interest to security teams as these can be likely entry points for an intruder. Unfortunately, this traffic can be particularly difficult to monitor.

For forensic analysis, security team investigations will require packet data that covers days or weeks of traffic between critical nodes. This means observability plans need to cover not just packet access, but capture and storage as well.

When setting up the monitoring infrastructure, several factors must also be weighed. At a basic level, the brokers, taps, capture devices, etc. all take up valuable rack space; consolidation, density and adequate topology planning are all critical. If data that's being monitored is sensitive or subject to privacy regulations, access to the visibility system and data must be controlled. The monitoring itself also creates a technical load on the network that must be accounted for (you don't want the monitoring itself to be the cause of performance issues).

Bridging the Visibility Gap

The appropriate monitoring infrastructure should be built around a subnet comprised of a load balancer, virtual packet broker and storage appliance, with equipment placed throughout the network at key points. One strategy to conserve space, save money and maximize resources is to use brokers as the "power strip" that distributes packets to firewalls and other security or NPM tools at the correct speeds. The subnet can further connect packet capture and storage to forensic tools for investigation, and feed NPM tools with real-time data to quickly triangulate network issues like latency, allowing IT teams to determine fault and, if necessary, negotiate with the cloud provider.

As mentioned, access to packet data in the public cloud is particularly difficult. The hyperscale providers all recognize the problems this lack of visibility causes, and each have taken different paths to solving it. AWS and GCP use similar mirroring approaches (VPC traffic (AWS) or packet (GCP) mirroring service). In basic terms, this traffic/packet mirroring duplicates network traffic to and from the client's applications and forwards it to cloud-native performance and security monitoring tool sets for assessment, and to capture devices for later analysis. This eliminates the need to deploy ad-hoc forwarding agents or sensors in each VPC instance for every monitoring tool. The raw data itself is not ready for analysis, and requires a virtual or cloud packet broker to ensure the right data gets to the right monitoring or security tools. That said, combining these mirroring options with virtual packet brokers can ultimately reduce cost, as a single stream only has to be mirrored once for the broker (as opposed to once per each NPM or security tool).

Solving the visibility challenge with Azure is different, and requires using what's known as "inline mode" on certain virtual packet brokers. This allows the packet broker itself to monitor subnet ingress and egress traffic to capture, pre-process, and deliver packet data in real-time to security, performance management, analytics, capture and other solutions.

Developing this visibility topology is complex; many companies may not have the necessary in-house staff to handle it, and may need to work with service providers or vendors on the design and set-up. But whether handled in-house or outsourced, keep tool and infrastructure sprawl in mind: a mixture of virtual and physical devices can save rack space in data centers, and leveraging the cloud for a consolidated management view of all packet broker and capture solutions can save considerable time.

Tech companies often take the slings and arrows that come with early adoption. But paying careful attention to visibility and monitoring allows organizations to better weather these issues by staying on-top of threats and ensuring the network is operating according to plan.

Nadeem Zahid is VP of Product Management & Marketing at cPacket Networks
Share this

The Latest

March 04, 2024

This year's Super Bowl drew in viewership of nearly 124 million viewers and made history as the most-watched live broadcast event since the 1969 moon landing. To support this spike in viewership, streaming companies like YouTube TV, Hulu and Paramount+ began preparing their IT infrastructure months in advance to ensure an exceptional viewer experience without outages or major interruptions. New Relic conducted a survey to understand the importance of a seamless viewing experience and the impact of outages during major streaming events such as the Super Bowl ...

March 01, 2024

As organizations continue to navigate the complexities of the digital era, which has been marked by exponential advancements in AI and technology, the strategic deployment of modern, practical applications has become indispensable for sustaining competitive advantage and realizing business goals. The Info-Tech Research Group report, Applications Priorities 2024, explores the following five initiatives for emerging and leading-edge technologies and practices that can enable IT and applications leaders to optimize their application portfolio and improve on capabilities needed to meet the ambitions of their organizations ...

February 29, 2024

Despite the growth in popularity of artificial intelligence (AI) and ML across a number of industries, there is still a huge amount of unrealized potential, with many businesses playing catch-up and still planning how ML solutions can best facilitate processes. Further progression could be limited without investment in specialized technical teams to drive development and integration ...

February 28, 2024

With over 200 streaming services to choose from, including multiple platforms featuring similar types of entertainment, users have little incentive to remain loyal to any given platform if it exhibits performance issues. Big names in streaming like Hulu, Amazon Prime and HBO Max invest thousands of hours into engineering observability and closed-loop monitoring to combat infrastructure and application issues, but smaller platforms struggle to remain competitive without access to the same resources ...

February 27, 2024

Generative AI has recently experienced unprecedented dramatic growth, making it one of the most exciting transformations the tech industry has seen in some time. However, this growth also poses a challenge for tech leaders who will be expected to deliver on the promise of new technology. In 2024, delivering tangible outcomes that meet the potential of AI, and setting up incubator projects for the future will be key tasks ...

February 26, 2024

SAP is a tool for automating business processes. Managing SAP solutions, especially with the shift to the cloud-based S/4HANA platform, can be intricate. To explore the concerns of SAP users during operational transformations and automation, a survey was conducted in mid-2023 by Digitate and Americas' SAP Users' Group ...

February 22, 2024

Some companies are just starting to dip their toes into developing AI capabilities, while (few) others can claim they have built a truly AI-first product. Regardless of where a company is on the AI journey, leaders must understand what it means to build every aspect of their product with AI in mind ...

February 21, 2024

Generative AI will usher in advantages within various industries. However, the technology is still nascent, and according to the recent Dynatrace survey there are many challenges and risks that organizations need to overcome to use this technology effectively ...

February 20, 2024

In today's digital era, monitoring and observability are indispensable in software and application development. Their efficacy lies in empowering developers to swiftly identify and address issues, enhance performance, and deliver flawless user experiences. Achieving these objectives requires meticulous planning, strategic implementation, and consistent ongoing maintenance. In this blog, we're sharing our five best practices to fortify your approach to application performance monitoring (APM) and observability ...

February 16, 2024

In MEAN TIME TO INSIGHT Episode 3, Shamus McGillicuddy, VP of Research, Network Infrastructure and Operations, at Enterprise Management Associates (EMA) discusses network security with Chris Steffen, VP of Research Covering Information Security, Risk, and Compliance Management at EMA ...