Skip to main content

The Changing Face of Network Downtime

Vess Bakalov

Our connected world continues to transform into a mobile one. The network is a constant and fascinating companion, which grants us 24/7 access where communication is instant and takes place across an array of devices, unconstrained by physical barriers. As a result, the IT infrastructure is more critical than ever for business operations. Companies and organizations are calling upon a variety of technologies that are changing the face of today’s network — from mobile devices, to cloud services, to web-based applications.

And the strain on the network is not expected to decrease. In fact, Cisco reports that in two years, the number of devices connected to IP networks will be nearly three times that of the global population. At the same time, network management and performance challenges are also on the rise. The explosion of mobile, cloud and web-based apps make it difficult to determine where in today’s evolving world, the network begins and where it ends. As a result, service issues and outages are becoming more commonplace, prompting losses in revenue, customer satisfaction and employee productivity. A recent survey from Avaya speaks to the cost of network downtime, addressing the large degree of variance based on the characteristics of a business and environment (i.e., your vertical, risk tolerance, etc.), indicating the range is from $140K to $540K per hour.

Over the past couple of months, we’ve seen high-profile network outages capturing headlines across the US. A large number of service providers were affected by the 512K Day issue – when the Internet routing table grew beyond what many legacy routers were designed to handle. Then, in August more than 11 million Time Warner Cable (TWC) subscribers across 29 states lost service for about three hours, and just a week later, Facebook suffered its fourth outage over the past five months. Unavailability in two of the three previously mentioned cases was blamed on configuration glitches and as a result, quickly resolved.

The Most Important Word for Every Network: Availability

But why do network outages seem to be popping up more frequently, affecting more people? It’s really a question of perception – more people are consuming more services and everyone expects to be connected around the clock, around the world, using any device.

In a blog post earlier this summer, Andrew Lerner, a Research Director for Gartner, zeroed in on the most important word associated with every network: availability. As he notes, “Performance, scalability, management, agility, etc. all require the network to actually be online.”

Unfortunately, availability is assumed to be table stakes to most companies. I am not sure I agree with him entirely. Availability is table stakes. However, modern infrastructure — especially in service providers — is massively redundant. Pure availability is rarely the problem. More often service outages are due to poor capacity planning, spurious events or changes that bring unanticipated consequences (like Pakistan inadvertently re-routing all YouTube traffic).

For smaller businesses in particular, unavailability of core services not only represents a loss of control and a loss of earnings, but also potentially a lesson in reputational damage. Without network performance management solutions, businesses are unnecessarily exposing themselves to risk. Technology should be detecting and even preventing outages automatically, without the need for manual intervention. Technical staff cannot be expected to continually gather and analyze data that might indicate an impending outage, nor can they be expected to act quickly enough to stave off an incident. While the likes of TWC and Facebook can rapidly recover from disruptive infrastructure issues, smaller organizations can’t, and that is why they must take steps to protect themselves.

Reacting to performance thresholds is not enough. To ensure a company’s network is available 24/7, it’s critical to predict problems before they become service impacting. The deployment of solutions that log data and provide real-time analytics on large volumes of unstructured data are crucial to every IT department. These solutions provide IT organizations the opportunity to gain better insight into the behavior of users, customers, applications and networks, allowing businesses to spot issues before they happen – significantly reducing, or in some cases, eliminating downtime altogether.

Vess Bakalov is SVP, CTO and Co-Founder of SevOne.

The Latest

Traditional observability requires users to leap across different platforms or tools for metrics, logs, or traces and related issues manually, which is very time-consuming, so as to reasonably ascertain the root cause. Observability 2.0 fixes this by unifying all telemetry data, logs, metrics, and traces into a single, context-rich pipeline that flows into one smart platform. But this is far from just having a bunch of additional data; this data is actionable, predictive, and tied to revenue realization ...

64% of enterprise networking teams use internally developed software or scripts for network automation, but 61% of those teams spend six or more hours per week debugging and maintaining them, according to From Scripts to Platforms: Why Homegrown Tools Dominate Network Automation and How Vendors Can Help, my latest EMA report ...

Cloud computing has transformed how we build and scale software, but it has also quietly introduced one of the most persistent challenges in modern IT: cost visibility and control ... So why, after more than a decade of cloud adoption, are cloud costs still spiraling out of control? The answer lies not in tooling but in culture ...

CEOs are committed to advancing AI solutions across their organization even as they face challenges from accelerating technology adoption, according to the IBM CEO Study. The survey revealed that executive respondents expect the growth rate of AI investments to more than double in the next two years, and 61% confirm they are actively adopting AI agents today and preparing to implement them at scale ...

Image
IBM

 

A major architectural shift is underway across enterprise networks, according to a new global study from Cisco. As AI assistants, agents, and data-driven workloads reshape how work gets done, they're creating faster, more dynamic, more latency-sensitive, and more complex network traffic. Combined with the ubiquity of connected devices, 24/7 uptime demands, and intensifying security threats, these shifts are driving infrastructure to adapt and evolve ...

Image
Cisco

The development of banking apps was supposed to provide users with convenience, control and piece of mind. However, for thousands of Halifax customers recently, a major mobile outage caused the exact opposite, leaving customers unable to check balances, or pay bills, sparking widespread frustration. This wasn't an isolated incident ... So why are these failures still happening? ...

Cyber threats are growing more sophisticated every day, and at their forefront are zero-day vulnerabilities. These elusive security gaps are exploited before a fix becomes available, making them among the most dangerous threats in today's digital landscape ... This guide will explore what these vulnerabilities are, how they work, why they pose such a significant threat, and how modern organizations can stay protected ...

The prevention of data center outages continues to be a strategic priority for data center owners and operators. Infrastructure equipment has improved, but the complexity of modern architectures and evolving external threats presents new risks that operators must actively manage, according to the Data Center Outage Analysis 2025 from Uptime Institute ...

As observability engineers, we navigate a sea of telemetry daily. We instrument our applications, configure collectors, and build dashboards, all in pursuit of understanding our complex distributed systems. Yet, amidst this flood of data, a critical question often remains unspoken, or at best, answered by gut feeling: "Is our telemetry actually good?" ... We're inviting you to participate in shaping a foundational element for better observability: the Instrumentation Score ...

We're inching ever closer toward a long-held goal: technology infrastructure that is so automated that it can protect itself. But as IT leaders aggressively employ automation across our enterprises, we need to continuously reassess what AI is ready to manage autonomously and what can not yet be trusted to algorithms ...

The Changing Face of Network Downtime

Vess Bakalov

Our connected world continues to transform into a mobile one. The network is a constant and fascinating companion, which grants us 24/7 access where communication is instant and takes place across an array of devices, unconstrained by physical barriers. As a result, the IT infrastructure is more critical than ever for business operations. Companies and organizations are calling upon a variety of technologies that are changing the face of today’s network — from mobile devices, to cloud services, to web-based applications.

And the strain on the network is not expected to decrease. In fact, Cisco reports that in two years, the number of devices connected to IP networks will be nearly three times that of the global population. At the same time, network management and performance challenges are also on the rise. The explosion of mobile, cloud and web-based apps make it difficult to determine where in today’s evolving world, the network begins and where it ends. As a result, service issues and outages are becoming more commonplace, prompting losses in revenue, customer satisfaction and employee productivity. A recent survey from Avaya speaks to the cost of network downtime, addressing the large degree of variance based on the characteristics of a business and environment (i.e., your vertical, risk tolerance, etc.), indicating the range is from $140K to $540K per hour.

Over the past couple of months, we’ve seen high-profile network outages capturing headlines across the US. A large number of service providers were affected by the 512K Day issue – when the Internet routing table grew beyond what many legacy routers were designed to handle. Then, in August more than 11 million Time Warner Cable (TWC) subscribers across 29 states lost service for about three hours, and just a week later, Facebook suffered its fourth outage over the past five months. Unavailability in two of the three previously mentioned cases was blamed on configuration glitches and as a result, quickly resolved.

The Most Important Word for Every Network: Availability

But why do network outages seem to be popping up more frequently, affecting more people? It’s really a question of perception – more people are consuming more services and everyone expects to be connected around the clock, around the world, using any device.

In a blog post earlier this summer, Andrew Lerner, a Research Director for Gartner, zeroed in on the most important word associated with every network: availability. As he notes, “Performance, scalability, management, agility, etc. all require the network to actually be online.”

Unfortunately, availability is assumed to be table stakes to most companies. I am not sure I agree with him entirely. Availability is table stakes. However, modern infrastructure — especially in service providers — is massively redundant. Pure availability is rarely the problem. More often service outages are due to poor capacity planning, spurious events or changes that bring unanticipated consequences (like Pakistan inadvertently re-routing all YouTube traffic).

For smaller businesses in particular, unavailability of core services not only represents a loss of control and a loss of earnings, but also potentially a lesson in reputational damage. Without network performance management solutions, businesses are unnecessarily exposing themselves to risk. Technology should be detecting and even preventing outages automatically, without the need for manual intervention. Technical staff cannot be expected to continually gather and analyze data that might indicate an impending outage, nor can they be expected to act quickly enough to stave off an incident. While the likes of TWC and Facebook can rapidly recover from disruptive infrastructure issues, smaller organizations can’t, and that is why they must take steps to protect themselves.

Reacting to performance thresholds is not enough. To ensure a company’s network is available 24/7, it’s critical to predict problems before they become service impacting. The deployment of solutions that log data and provide real-time analytics on large volumes of unstructured data are crucial to every IT department. These solutions provide IT organizations the opportunity to gain better insight into the behavior of users, customers, applications and networks, allowing businesses to spot issues before they happen – significantly reducing, or in some cases, eliminating downtime altogether.

Vess Bakalov is SVP, CTO and Co-Founder of SevOne.

The Latest

Traditional observability requires users to leap across different platforms or tools for metrics, logs, or traces and related issues manually, which is very time-consuming, so as to reasonably ascertain the root cause. Observability 2.0 fixes this by unifying all telemetry data, logs, metrics, and traces into a single, context-rich pipeline that flows into one smart platform. But this is far from just having a bunch of additional data; this data is actionable, predictive, and tied to revenue realization ...

64% of enterprise networking teams use internally developed software or scripts for network automation, but 61% of those teams spend six or more hours per week debugging and maintaining them, according to From Scripts to Platforms: Why Homegrown Tools Dominate Network Automation and How Vendors Can Help, my latest EMA report ...

Cloud computing has transformed how we build and scale software, but it has also quietly introduced one of the most persistent challenges in modern IT: cost visibility and control ... So why, after more than a decade of cloud adoption, are cloud costs still spiraling out of control? The answer lies not in tooling but in culture ...

CEOs are committed to advancing AI solutions across their organization even as they face challenges from accelerating technology adoption, according to the IBM CEO Study. The survey revealed that executive respondents expect the growth rate of AI investments to more than double in the next two years, and 61% confirm they are actively adopting AI agents today and preparing to implement them at scale ...

Image
IBM

 

A major architectural shift is underway across enterprise networks, according to a new global study from Cisco. As AI assistants, agents, and data-driven workloads reshape how work gets done, they're creating faster, more dynamic, more latency-sensitive, and more complex network traffic. Combined with the ubiquity of connected devices, 24/7 uptime demands, and intensifying security threats, these shifts are driving infrastructure to adapt and evolve ...

Image
Cisco

The development of banking apps was supposed to provide users with convenience, control and piece of mind. However, for thousands of Halifax customers recently, a major mobile outage caused the exact opposite, leaving customers unable to check balances, or pay bills, sparking widespread frustration. This wasn't an isolated incident ... So why are these failures still happening? ...

Cyber threats are growing more sophisticated every day, and at their forefront are zero-day vulnerabilities. These elusive security gaps are exploited before a fix becomes available, making them among the most dangerous threats in today's digital landscape ... This guide will explore what these vulnerabilities are, how they work, why they pose such a significant threat, and how modern organizations can stay protected ...

The prevention of data center outages continues to be a strategic priority for data center owners and operators. Infrastructure equipment has improved, but the complexity of modern architectures and evolving external threats presents new risks that operators must actively manage, according to the Data Center Outage Analysis 2025 from Uptime Institute ...

As observability engineers, we navigate a sea of telemetry daily. We instrument our applications, configure collectors, and build dashboards, all in pursuit of understanding our complex distributed systems. Yet, amidst this flood of data, a critical question often remains unspoken, or at best, answered by gut feeling: "Is our telemetry actually good?" ... We're inviting you to participate in shaping a foundational element for better observability: the Instrumentation Score ...

We're inching ever closer toward a long-held goal: technology infrastructure that is so automated that it can protect itself. But as IT leaders aggressively employ automation across our enterprises, we need to continuously reassess what AI is ready to manage autonomously and what can not yet be trusted to algorithms ...