When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many.
IT resources that once served business needs well are now faced with a whole different set of employee needs, while home internet and hardware is also being put to the test. There are already examples of infrastructure under strain. One leading cloud provider was hit by an outage for several hours in March, after a connectivity issue in one of its US data centers. And a video calling service, which has seen a massive upsurge in usage, experienced a partial outage in April. Another example is the U.S. government loan plan intended to support small businesses, which was beset with technical issues, leaving many banks and would-be users unable to access the online portal.
It stands to reason that organizations are trying to make do during this unprecedented time, patching existing products and services with rapidly deployed bolt-ons to keep their business running while also improving communication and accessibility for people who work from home.
An outage is a very public kind of failure. But as embarrassing as it is for the companies that make the headlines, it's a risk all businesses face. There are multiple potential causes — network failures, software malfunctions, usage spikes, human error and configuration error among them.
Those headline-making outages give business leaders big headaches as they deal with the huge costs associated — running into the hundreds of millions — as well as the impact on the confidence of their customers. Fortune 1,000 companies lose between $1.25 billion to $2.5 billion every year due to unplanned outages.
Counting the Cost
The length, cost and impact of an outage will vary, at least in part because multiple parts of a business are likely to be affected simultaneously. The size and scale of a company can also complicate the problem. Evolving technology and platforms across multiple locations can cause weak points that are not immediately obvious without oversight of the entire system. With tightening operations budgets, this can be a constant challenge.
A report by Ponemon put the average cost of downtime at nearly $9,000 a minute. Outage
cost, of course, varies greatly depending on the size of the business affected and the sector it operates in. Banking, government, healthcare, manufacturing, media, retail, utilities and transportation are among those most at risk — and where outages are the most costly.
How much downtime costs an organization isn't just a matter of looking at lost revenue. Business disruption, reputational damage, customer churn and the effect on productivity levels also factor in. Further down the line, there may well be a fallout caused by fines, litigation or settlements, third-party costs and equipment replacement.
Steps Toward Resilience
During downtime, what usually happens is a trial-and-error approach that relies on intrinsic knowledge and teams who are working in operational and technology silos. This is likely to prolong the amount of time businesses are offline.
A better solution is for organizations to determine what they can do in advance to avoid outages and implement a recovery plan to get them back up and running as soon as possible. This should include cooperation with third-party providers and technology partners. Agile businesses will be best placed to weather the current situation. The ability to adapt to demand quickly and fall back on a robust IT application system will help ensure that resilience.
To reduce the risk of downtime, take the obvious steps involved in eliminating single points of failure — balancing load between servers, following good back-up practices and building in technical fail-safes. What is becoming increasingly apparent is that sophisticated AI, predictive processes and automation are starting to play a critical role in prevention.
This cognitive technology essentially operates at three basic levels: the ability to perform tasks, perform activities and handle situations. This last group of intelligent incident or situation handling prioritizes what needs to be acted on, identifying the root cause and prescribing an action. It further augments productivity by performing the action autonomously. Enabling these mission-critical applications to keep IT running is core to supporting current, essential services such as healthcare systems, utilities, telecom providers, and retail and distribution services.
Though the pandemic is straining IT teams, many of the challenges that companies are facing now have common ground with smaller-scale problems that crop up during "business as usual." It is also important to remember that the business landscape is going to look fundamentally different once the immediate crisis has passed. There will be increasing demand to run businesses effectively by working remotely, managing cash flows through smart supplier management, and shifting from a reactive to a proactive mode of IT operations by eliminating slow and error-prone manual processes.
COVID-19 has created a new "business as usual," and companies will benefit from the assist that intelligent systems management, leveraging AI and automation platforms, can give. Such tools help organizations to become more resilient and adaptable, creating a more reliable infrastructure that minimizes or even eliminates outages.
Part 4 covers OpenTelemetry: Next year, we're going to see more embrace of OpenTelemetry across the entire industry — opening up the future of instrumentation ...
Part 3 covers even more on Observability: Observability will move up the organization to support the sustainability and FinOps drive. The combined pressure of needing to adopt more sustainable practices and tackle rising cloud costs will catapult observability from an IT priority to a business requirement in 2024 ...
Part 2 covers more on Observability: In 2024, observability platforms will embrace and innovate with new technologies like GenAI for real-time analytics, becoming the fulcrum for digital experience management ...
The Holiday Season means it is time for APMdigest's annual list of Application Performance Management (APM) predictions, covering IT performance topics. Industry experts — from analysts and consultants to the top vendors — offer thoughtful, insightful, and often controversial predictions on how APM, Observability, AIOps and related technologies will evolve and impact business in 2024. Part 1 covers APM and Observability ...
To help you stay on top of the ever-evolving tech scene, Automox IT experts shake the proverbial magic eight ball and share their predictions about tech trends in the coming year. From M&A frenzies to sustainable tech and automation, these forecasts paint an exciting picture of the future ...