The Changing Face of Network Downtime
October 02, 2014

Vess Bakalov
SevOne

Share this

Our connected world continues to transform into a mobile one. The network is a constant and fascinating companion, which grants us 24/7 access where communication is instant and takes place across an array of devices, unconstrained by physical barriers. As a result, the IT infrastructure is more critical than ever for business operations. Companies and organizations are calling upon a variety of technologies that are changing the face of today’s network — from mobile devices, to cloud services, to web-based applications.

And the strain on the network is not expected to decrease. In fact, Cisco reports that in two years, the number of devices connected to IP networks will be nearly three times that of the global population. At the same time, network management and performance challenges are also on the rise. The explosion of mobile, cloud and web-based apps make it difficult to determine where in today’s evolving world, the network begins and where it ends. As a result, service issues and outages are becoming more commonplace, prompting losses in revenue, customer satisfaction and employee productivity. A recent survey from Avaya speaks to the cost of network downtime, addressing the large degree of variance based on the characteristics of a business and environment (i.e., your vertical, risk tolerance, etc.), indicating the range is from $140K to $540K per hour.

Over the past couple of months, we’ve seen high-profile network outages capturing headlines across the US. A large number of service providers were affected by the 512K Day issue – when the Internet routing table grew beyond what many legacy routers were designed to handle. Then, in August more than 11 million Time Warner Cable (TWC) subscribers across 29 states lost service for about three hours, and just a week later, Facebook suffered its fourth outage over the past five months. Unavailability in two of the three previously mentioned cases was blamed on configuration glitches and as a result, quickly resolved.

The Most Important Word for Every Network: Availability

But why do network outages seem to be popping up more frequently, affecting more people? It’s really a question of perception – more people are consuming more services and everyone expects to be connected around the clock, around the world, using any device.

In a blog post earlier this summer, Andrew Lerner, a Research Director for Gartner, zeroed in on the most important word associated with every network: availability. As he notes, “Performance, scalability, management, agility, etc. all require the network to actually be online.”

Unfortunately, availability is assumed to be table stakes to most companies. I am not sure I agree with him entirely. Availability is table stakes. However, modern infrastructure — especially in service providers — is massively redundant. Pure availability is rarely the problem. More often service outages are due to poor capacity planning, spurious events or changes that bring unanticipated consequences (like Pakistan inadvertently re-routing all YouTube traffic).

For smaller businesses in particular, unavailability of core services not only represents a loss of control and a loss of earnings, but also potentially a lesson in reputational damage. Without network performance management solutions, businesses are unnecessarily exposing themselves to risk. Technology should be detecting and even preventing outages automatically, without the need for manual intervention. Technical staff cannot be expected to continually gather and analyze data that might indicate an impending outage, nor can they be expected to act quickly enough to stave off an incident. While the likes of TWC and Facebook can rapidly recover from disruptive infrastructure issues, smaller organizations can’t, and that is why they must take steps to protect themselves.

Reacting to performance thresholds is not enough. To ensure a company’s network is available 24/7, it’s critical to predict problems before they become service impacting. The deployment of solutions that log data and provide real-time analytics on large volumes of unstructured data are crucial to every IT department. These solutions provide IT organizations the opportunity to gain better insight into the behavior of users, customers, applications and networks, allowing businesses to spot issues before they happen – significantly reducing, or in some cases, eliminating downtime altogether.

Vess Bakalov is SVP, CTO and Co-Founder of SevOne.

Share this

The Latest

September 23, 2021

The Internet played a greater role than ever in supporting enterprise productivity over the past year-plus, as newly remote workers logged onto the job via residential links that, it turns out, left much to be desired in terms of enabling work ...

September 22, 2021

The world's appetite for cloud services has increased but now, more than 18 months since the beginning of the pandemic, organizations are assessing their cloud spend and trying to better understand the IT investments that were made under pressure. This is a huge challenge in and of itself, with the added complexity of embracing hybrid work ...

September 21, 2021

After a year of unprecedented challenges and change, tech pros responding to this year’s survey, IT Pro Day 2021 survey: Bring IT On from SolarWinds, report a positive perception of their roles and say they look forward to what lies ahead ...

September 20, 2021

One of the key performance indicators for IT Ops is MTTR (Mean-Time-To-Resolution). MTTR essentially measures the length of your incident management lifecycle: from detection; through assignment, triage and investigation; to remediation and resolution. IT Ops teams strive to shorten their incident management lifecycle and lower their MTTR, to meet their SLAs and maintain healthy infrastructures and services. But that's often easier said than done, with incident triage being a key factor in that challenge ...

September 16, 2021

Achieve more with less. How many of you feel that pressure — or, even worse, hear those words — trickle down from leadership? The reality is that overworked and under-resourced IT departments will only lead to chronic errors, missed deadlines and service assurance failures. After all, we're only human. So what are overburdened IT departments to do? Reduce the human factor. In a word: automate ...

September 15, 2021

On average, data innovators release twice as many products and increase employee productivity at double the rate of organizations with less mature data strategies, according to the State of Data Innovation report from Splunk ...

September 14, 2021

While 90% of respondents believe observability is important and strategic to their business — and 94% believe it to be strategic to their role — just 26% noted mature observability practices within their business, according to the 2021 Observability Forecast ...

September 13, 2021

Let's explore a few of the most prominent app success indicators and how app engineers can shift their development strategy to better meet the needs of today's app users ...

September 09, 2021

Business enterprises aiming at digital transformation or IT companies developing new software applications face challenges in developing eye-catching, robust, fast-loading, mobile-friendly, content-rich, and user-friendly software. However, with increased pressure to reduce costs and save time, business enterprises often give a short shrift to performance testing services ...

September 08, 2021

DevOps, SRE and other operations teams use observability solutions with AIOps to ingest and normalize data to get visibility into tech stacks from a centralized system, reduce noise and understand the data's context for quicker mean time to recovery (MTTR). With AI using these processes to produce actionable insights, teams are free to spend more time innovating and providing superior service assurance. Let's explore AI's role in ingestion and normalization, and then dive into correlation and deduplication too ...