When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many.
IT resources that once served business needs well are now faced with a whole different set of employee needs, while home internet and hardware is also being put to the test. There are already examples of infrastructure under strain. One leading cloud provider was hit by an outage for several hours in March, after a connectivity issue in one of its US data centers. And a video calling service, which has seen a massive upsurge in usage, experienced a partial outage in April. Another example is the U.S. government loan plan intended to support small businesses, which was beset with technical issues, leaving many banks and would-be users unable to access the online portal.
It stands to reason that organizations are trying to make do during this unprecedented time, patching existing products and services with rapidly deployed bolt-ons to keep their business running while also improving communication and accessibility for people who work from home.
An outage is a very public kind of failure. But as embarrassing as it is for the companies that make the headlines, it's a risk all businesses face. There are multiple potential causes — network failures, software malfunctions, usage spikes, human error and configuration error among them.
Those headline-making outages give business leaders big headaches as they deal with the huge costs associated — running into the hundreds of millions — as well as the impact on the confidence of their customers. Fortune 1,000 companies lose between $1.25 billion to $2.5 billion every year due to unplanned outages.
Counting the Cost
The length, cost and impact of an outage will vary, at least in part because multiple parts of a business are likely to be affected simultaneously. The size and scale of a company can also complicate the problem. Evolving technology and platforms across multiple locations can cause weak points that are not immediately obvious without oversight of the entire system. With tightening operations budgets, this can be a constant challenge.
A report by Ponemon put the average cost of downtime at nearly $9,000 a minute. Outage
cost, of course, varies greatly depending on the size of the business affected and the sector it operates in. Banking, government, healthcare, manufacturing, media, retail, utilities and transportation are among those most at risk — and where outages are the most costly.
How much downtime costs an organization isn't just a matter of looking at lost revenue. Business disruption, reputational damage, customer churn and the effect on productivity levels also factor in. Further down the line, there may well be a fallout caused by fines, litigation or settlements, third-party costs and equipment replacement.
Steps Toward Resilience
During downtime, what usually happens is a trial-and-error approach that relies on intrinsic knowledge and teams who are working in operational and technology silos. This is likely to prolong the amount of time businesses are offline.
A better solution is for organizations to determine what they can do in advance to avoid outages and implement a recovery plan to get them back up and running as soon as possible. This should include cooperation with third-party providers and technology partners. Agile businesses will be best placed to weather the current situation. The ability to adapt to demand quickly and fall back on a robust IT application system will help ensure that resilience.
To reduce the risk of downtime, take the obvious steps involved in eliminating single points of failure — balancing load between servers, following good back-up practices and building in technical fail-safes. What is becoming increasingly apparent is that sophisticated AI, predictive processes and automation are starting to play a critical role in prevention.
This cognitive technology essentially operates at three basic levels: the ability to perform tasks, perform activities and handle situations. This last group of intelligent incident or situation handling prioritizes what needs to be acted on, identifying the root cause and prescribing an action. It further augments productivity by performing the action autonomously. Enabling these mission-critical applications to keep IT running is core to supporting current, essential services such as healthcare systems, utilities, telecom providers, and retail and distribution services.
Though the pandemic is straining IT teams, many of the challenges that companies are facing now have common ground with smaller-scale problems that crop up during "business as usual." It is also important to remember that the business landscape is going to look fundamentally different once the immediate crisis has passed. There will be increasing demand to run businesses effectively by working remotely, managing cash flows through smart supplier management, and shifting from a reactive to a proactive mode of IT operations by eliminating slow and error-prone manual processes.
COVID-19 has created a new "business as usual," and companies will benefit from the assist that intelligent systems management, leveraging AI and automation platforms, can give. Such tools help organizations to become more resilient and adaptable, creating a more reliable infrastructure that minimizes or even eliminates outages.
I've had the opportunity to work with a number of organizations embarking on their AIOps journey. I always advise them to start by evaluating their needs and the possibilities AIOps can bring to them through five different levels of AIOps maturity. This is a strategic approach that allows enterprises to achieve complete automation for long-term success ...
Sumo Logic recently commissioned an independent market research study to understand the industry momentum behind continuous intelligence — and the necessity for digital organizations to embrace a cloud-native, real-time continuous intelligence platform to support the speed and agility of business for faster decision-making, optimizing security, driving new innovation and delivering world-class customer experiences. Some of the key findings include ...
When it comes to viruses, it's typically those of the computer/digital variety that IT is concerned about. But with the ongoing pandemic, IT operations teams are on the hook to maintain business functions in the midst of rapid and massive change. One of the biggest challenges for businesses is the shift to remote work at scale. Ensuring that they can continue to provide products and services — and satisfy their customers — against this backdrop is challenging for many ...
Teams tasked with developing and delivering software are under pressure to balance the business imperative for speed with high customer expectations for quality. In the course of trying to achieve this balance, engineering organizations rely on a variety of tools, techniques and processes. The 2020 State of Software Quality report provides a snapshot of the key challenges organizations encounter when it comes to delivering quality software at speed, as well as how they are approaching these hurdles. This blog introduces its key findings ...
For IT teams, run-the-business, commodity areas such as employee help desks, device support and communication platforms are regularly placed in the crosshairs for cost takeout, but these areas are also highly visible to employees. Organizations can improve employee satisfaction and business performance by building unified functions that are measured by employee experience rather than price. This approach will ultimately fund transformation, as well as increase productivity and innovation ...