When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many.
IT resources that once served business needs well are now faced with a whole different set of employee needs, while home internet and hardware is also being put to the test. There are already examples of infrastructure under strain. One leading cloud provider was hit by an outage for several hours in March, after a connectivity issue in one of its US data centers. And a video calling service, which has seen a massive upsurge in usage, experienced a partial outage in April. Another example is the U.S. government loan plan intended to support small businesses, which was beset with technical issues, leaving many banks and would-be users unable to access the online portal.
It stands to reason that organizations are trying to make do during this unprecedented time, patching existing products and services with rapidly deployed bolt-ons to keep their business running while also improving communication and accessibility for people who work from home.
An outage is a very public kind of failure. But as embarrassing as it is for the companies that make the headlines, it's a risk all businesses face. There are multiple potential causes — network failures, software malfunctions, usage spikes, human error and configuration error among them.
Those headline-making outages give business leaders big headaches as they deal with the huge costs associated — running into the hundreds of millions — as well as the impact on the confidence of their customers. Fortune 1,000 companies lose between $1.25 billion to $2.5 billion every year due to unplanned outages.
Counting the Cost
The length, cost and impact of an outage will vary, at least in part because multiple parts of a business are likely to be affected simultaneously. The size and scale of a company can also complicate the problem. Evolving technology and platforms across multiple locations can cause weak points that are not immediately obvious without oversight of the entire system. With tightening operations budgets, this can be a constant challenge.
A report by Ponemon put the average cost of downtime at nearly $9,000 a minute. Outage
cost, of course, varies greatly depending on the size of the business affected and the sector it operates in. Banking, government, healthcare, manufacturing, media, retail, utilities and transportation are among those most at risk — and where outages are the most costly.
How much downtime costs an organization isn't just a matter of looking at lost revenue. Business disruption, reputational damage, customer churn and the effect on productivity levels also factor in. Further down the line, there may well be a fallout caused by fines, litigation or settlements, third-party costs and equipment replacement.
Steps Toward Resilience
During downtime, what usually happens is a trial-and-error approach that relies on intrinsic knowledge and teams who are working in operational and technology silos. This is likely to prolong the amount of time businesses are offline.
A better solution is for organizations to determine what they can do in advance to avoid outages and implement a recovery plan to get them back up and running as soon as possible. This should include cooperation with third-party providers and technology partners. Agile businesses will be best placed to weather the current situation. The ability to adapt to demand quickly and fall back on a robust IT application system will help ensure that resilience.
To reduce the risk of downtime, take the obvious steps involved in eliminating single points of failure — balancing load between servers, following good back-up practices and building in technical fail-safes. What is becoming increasingly apparent is that sophisticated AI, predictive processes and automation are starting to play a critical role in prevention.
This cognitive technology essentially operates at three basic levels: the ability to perform tasks, perform activities and handle situations. This last group of intelligent incident or situation handling prioritizes what needs to be acted on, identifying the root cause and prescribing an action. It further augments productivity by performing the action autonomously. Enabling these mission-critical applications to keep IT running is core to supporting current, essential services such as healthcare systems, utilities, telecom providers, and retail and distribution services.
Though the pandemic is straining IT teams, many of the challenges that companies are facing now have common ground with smaller-scale problems that crop up during "business as usual." It is also important to remember that the business landscape is going to look fundamentally different once the immediate crisis has passed. There will be increasing demand to run businesses effectively by working remotely, managing cash flows through smart supplier management, and shifting from a reactive to a proactive mode of IT operations by eliminating slow and error-prone manual processes.
COVID-19 has created a new "business as usual," and companies will benefit from the assist that intelligent systems management, leveraging AI and automation platforms, can give. Such tools help organizations to become more resilient and adaptable, creating a more reliable infrastructure that minimizes or even eliminates outages.
Respondents to an OpsRamp survey are moving forward with digital transformation, but many are re-evaluating the number and type of tools they're using. There are three main takeaways from the survey ...
More and more mainframe decision makers are becoming aware that the traditional way of handling mainframe operations will soon fall by the wayside. The ever-growing demand for newer, faster digital services has placed increased pressure on data centers to keep up as new applications come online, the volume of data handled continually increases, and workloads become increasingly unpredictable. In a recent Forrester Consulting AIOps survey, commissioned by BMC, the majority of respondents cited that they spend too much time reacting to incidents and not enough time finding ways to prevent them ...
In the age of digital transformation, enterprises are migrating to open source software (OSS) in droves to streamline operations and improve customer and employee experiences. However, to unlock the deluge of OSS benefits, it's not enough for organizations to simply implement the software. They must take the necessary steps to build an intentional OSS strategy rooted in ongoing third-party support and training ...
In Part 1 of this series, we explored the top pain points associated with managing Internet-based WANs today. This second installment will focus on today's most prevalent SD-WAN deployment challenges specifically and what you can do to better manage modern WANs overall ...