The global pandemic has radically changed the way enterprise IT services are produced, consumed, and managed. It also has exposed a glaring difference between the "the haves and have-nots" of the software development and operations teams.
Engineering teams riding on CI/CD and DevOps waves are starting to see the full potential and purpose of that now. However, newly distributed operations teams are struggling to cope with the sudden change to the WFH (work from home) concept. As a VP of IT operations of a large enterprise told me, "We are in a survival mode with bare minimum tools to cope with. We are fighting a gun battle with swords." This is because the IT operations teams were traditionally set up to work from centralized locations, unlike software and engineering teams. Some organizations have overcome that by implementing AIOps (artificial intelligence for IT operations) solutions; others are using a brute force method of employing more IT operations analysts to keep the distributed NOCs (network operations centers) going.
IT Operations Teams Were Already Stressed
Even before the pandemic started this "new normal" mode of operations, IT operations teams were stressed to deliver more with less. According to a survey of 1300 IT professionals by BigPanda from earlier this year:
■ Innovation and CI/CD culture have increased normal operational workloads by 50%. The majority of the surveyed (53%) expect their NOC/ITOps workloads to increase even more in the next two years.
■ ITOps & NOC teams experience fast-moving IT stacks. These technology changes — whether they were necessitated by faster development needs, or were hyper-scale architecture based changes, or technical debt based — almost always require additional training and insights into the stacks as well as additional qualified analysts.
■ About 47% of respondents see constant application and code changes and 39% experience constant infrastructure changes — most of them see multiple daily changes, sometimes even hourly changes.
To keep up with this, ITOps teams have requested more budget, more automation tools, and more qualified analysts. However, very surprisingly,
■ 56% of them expected their IT budgets to stay flat. And 21% expected their IT operations budgets to shrink.
Over the last few years, software design, development, and testing teams transitioned away from the traditional model to a remote work alternative. Though a lot of corporations have decided to promote face to face collaboration workforce culture recently, they had a mechanism to fall back when the pandemic hit. However, the operations teams were almost always working from a centralized network or security centers (NOC/SOC) and had no such setups in place to work remotely if needed.
IT Operations Teams Were Not Setup to Work Remotely
The coronavirus pandemic has created an even more stressful situation for the IT operations teams.
1. The IT Operations teams have become very distributed and lost almost all of their NOC center privileges almost overnight. These include, but are not limited to, visual health of systems on large monitors on walls, immediate availability of experts in the same room for advice, and quick collaborative decision making to solve critical issues in real time.
2. The DevOps teams are set up to push agile releases virtually, and with working from home, their release cycles have gone up by much higher cycles than normal.
3. The IT Ops teams might see a reduction in personnel and efficiency due to illness, self-isolation, and lay-offs, and they are not properly set up to mimic NOC centralized teams in a remote distributed working environment.
4. To keep up with working remotely, the CIOs are forced to spend more money on infrastructure services, which was not budgeted previously. According to Gartner, cloud-based telephony/messaging and conferencing will see high levels of spending — up 8.9% and 24.3% respectively. Additionally, with an increase in spending for VPN, virtual desktops, hardware upgrades, standup desks for employees, additional security software to work remotely, CIOs have even less money to spend on other things like hiring additional IT operations analysts.
5. Workloads have become more distributed. The DevOps teams are working crazy hours, in crazy locations, and they are making some crazy changes without keeping the operations teams in the loop. Enterprises are still not ready to measure the increased workloads and employee stress that is caused by it as they are still underwater coping with the distributed workforce changes.
With the new budget crunch because of the economic impact, many IT teams that were already under heavy strain have slashed their IT operations staff considerably just to stay alive. This is impacting and adding more stress to the IT operations teams to do more with much less.
Prepare for the Future as This Too Shall Pass
The forward-looking enterprises are already considering moving from survival mode to thriving mode. They are setting up the necessary tools, visibility, compliance, and control for operations teams so that in the future, whether working remotely or in person, they can cope with disruptions and deliver in sync with development and engineering teams. Now the Ops teams can remotely monitor, diagnose, and maintain hyper-scale hybrid cloud systems if needed. While this pandemic may end sometime in the future, there will be other situations that will require IT operations teams to work remotely. By preparing for those situations, enterprises can survive future disruptions and enable operations teams to work efficiently if the situation arises. And, as a bonus, opening up remote locations will allow enterprises to hire more qualified IT analysts without the limitation to hiring only in specific locations.
The bottom line is that old-fashioned IT with old-fashioned thinking can lead to disaster. Reduced budgets, reduced resources, increased workloads, and added stress could lead to an unsustainable spiral. If the CIOs can't support the digital dependency from anywhere during the pandemic and beyond, the business will eventually fail.
Because of the remote working situation, the number of daily incidents has gone up. In some verticals, such as online learning, entertainment services, and collaborative tools, the incidents levels have gone up 10x. Some of those online collaborative tools' security flaws were exposed under high volumes. Between dealing with those incidents, and keeping up with the development and DevOps teams pushing changes to fix them, the Ops teams and the IT operations analyst jobs have now become the most stressful IT jobs.
Here are some of the things enterprises can do to mitigate the situation:
1. If at all possible, stop supporting non-critical business applications. This will free up a lot of support time.
2. Prioritize solving business-critical issues (such as scalability, security flaws, etc.) over non-critical issues as well as feature requests. They can wait.
3. Automate the IT processes as much as possible. The IT teams should be set up to find and solve issues efficiently.
4. Synchronize development and the IT Ops teams. Unless the Ops teams are aware of things that broke the system, they might be looking in the wrong places to solve issues.
5. Use ML, AI, and AIOps to reduce the noise (aka multiple alerts, tickets for the same incident) so teams can avoid distractions, spot early warnings, and concentrate on real issues. Properly implemented AIOps solution can reduce up to 95%+ alerts and avoid teams from feeling overwhelmed by "alert fatigue."
6. Automate the routing of incidents to the right resource quickly rather than escalating through multiple levels of support.
Organizations around the world are facing heightened pressure to accelerate their digital transformation, as their customers, competitors, and business stakeholders all recognize doing so is no longer a company strategy, but a matter of survival. At the same time, these organizations are experiencing an equally difficult counter-pressure resulting from this transformation: complex multicloud environments and a growing inability to manage them ...
The "New Normal" in IT — the fact that most DevOps personnel work from home (WFH) now — is here to stay. What started out as a reaction to the COVID-19 pandemic is now a way of life. Many experts agree that development teams will not be going back to the office any time soon, even if the public health concerns are abated. How should DevOps and development adapt to the new normal? That is the question DEVOPSdigest posed to the development community. DevOps industry experts — from analysts and consultants to community leaders and the top vendors — offer their best recommendations for how development organizations can react to this new environment ...
Shoppers are heading into Black Friday with high expectations for digital experiences and are only willing to experience a service interruption of five minutes or less to get the best deal, according to the 2020 Black Friday and Cyber Monday eCommerce Trends Study, from xMatters ...
Digital Experience Monitoring (DEM) has become significant to businesses more than ever. Global events like Covid continue to disrupt best practices within IT to support business. The pandemic has already forced millions of employees to WFH and adopt a hybrid workspace. Network connectivity and cloud application issues in these environments will continue to impact productivity and slow progress. Even so, transparent migration and deployment of on-premise workloads across multi-cloud providers, by their very nature are complex ...
APMdigest posed the following question to the IT Operations community: How should ITOps adapt to the new normal? In response, industry experts offered their best recommendations for how ITOps can adapt to this new remote work environment. Part 5, the final installment in the series, covers open source and emerging technologies ...