FAA Outage: System Downtime Puts an Entire Industry on Hold
January 17, 2023

Pete Goldin
APMdigest

Share this

"The US aviation sector was struggling to return to normal following a nationwide ground stop imposed by Federal Aviation Administration (FAA) early Wednesday over a computer issue that forced a 90-minute halt to all US departing flights," Reuters reported on January 11.


The breakdown showed how much American air travel depends on the computer system that generates alerts called NOTAMs — or Notice to Air Missions, Associated Press reported. The system broke down late Tuesday and was not fixed until midmorning Wednesday. The FAA took the rare step of preventing any planes from taking off for a time, and the cascading chaos led to more than 1,300 flight cancellations and 9,000 delays by early evening on the East Coast, according to flight-tracking website FlightAware.

The FAA said a corrupted file affected both the primary and backup systems.

Speaking of tech problems impacting the aviation industry, this happened only a couple weeks after Southwest Airlines experienced a meltdown during the holidays. National Public Radion reported: "By all accounts Southwest was using badly outdated computer systems to manage that complicated system."

But from the IT Ops perspective, the real take away from this news is not specifically about the FAA or the airline industry. Every organization faces this same concern every day — keeping systems updated and up and running. The alternative can be disastrous.

Many, if not most, companies in the US could not take a hit of this caliber and still maintain business as usual

"Today's FAA outage underscores the great need for modernized infrastructure, especially within organizations that operate on antiquated systems," said Fred Koopmans, BigPanda CPO. "The impact to travelers is obvious in this case, but it's imperative to also consider the internal mechanics the FAA will now have to address to recover from this."

Koopmans continued, "The average cost of a significant IT outage, according to 2022 research, is $6,912/minute or $414,720/hour – that's a $7.4M price tag for the FAA based on reports that issues arose at 3pm ET on Tuesday. Many, if not most, companies in the US could not take a hit of this caliber and still maintain business as usual."

"The outdated SaaS systems that many airlines rely upon are difficult to operate and run using older coding languages that few people still know how to use efficiently," explained Peter Pezaris, SVP of Strategy & User Experience at New Relic. "This means that when issues occur, they can be difficult to locate and fix — especially in a timely manner. Beyond that, they are also susceptible to cascading events, when a system fails and goes on to cause a ripple effect. As companies scale and the average tech stack becomes more complex, the risk of outages only rises. Not only is the IT team trying to get the system back up and running, but they are also fielding what can be a massive influx of requests ranging from internal stakeholders up to the Board level or customer complaints."

"Minimizing the time to understand the issue is critical," Pezaris added. "What makes this difficult is that most companies have observability data scattered everywhere. Observability unifies an organization's data and can provide airlines with a 360-degree view of their entire IT stacks, allowing engineers to detect and resolve issues before they impact flights."

Recently published data from New Relic's 2022 Observability Forecast shows that 45% of respondents experience an outage with a high business impact once per week or more — and 29% of those outages take an hour or more to resolve.

Pete Goldin is Editor and Publisher of APMdigest
Share this

The Latest

January 26, 2023

As enterprises work to implement or improve their observability practices, tool sprawl is a very real phenomenon ... Tool sprawl can and does happen all across the organization. In this post, though, we'll focus specifically on how and why observability efforts often result in tool sprawl, some of the possible negative consequences of that sprawl, and we'll offer some advice on how to reduce or even avoid sprawl ...

January 25, 2023

As companies generate more data across their network footprints, they need network observability tools to help find meaning in that data for better decision-making and problem solving. It seems many companies believe that adding more tools leads to better and faster insights ... And yet, observability tools aren't meeting many companies' needs. In fact, adding more tools introduces new challenges ...

January 24, 2023

Driven by the need to create scalable, faster, and more agile systems, businesses are adopting cloud native approaches. But cloud native environments also come with an explosion of data and complexity that makes it harder for businesses to detect and remediate issues before everything comes to a screeching halt. Observability, if done right, can make it easier to mitigate these challenges and remediate incidents before they become major customer-impacting problems ...

January 23, 2023

The spiraling cost of energy is forcing public cloud providers to raise their prices significantly. A recent report by Canalys predicted that public cloud prices will jump by around 20% in the US and more than 30% in Europe in 2023. These steep price increases will test the conventional wisdom that moving to the cloud is a cheap computing alternative ...

January 19, 2023

Despite strong interest over the past decade, the actual investment in DX has been recent. While 100% of enterprises are now engaged with DX in some way, most (77%) have begun their DX journey within the past two years. And most are early stage, with a fourth (24%) at the discussion stage and half (49%) currently transforming. Only 27% say they have finished their DX efforts ...

January 18, 2023

While most thought that distraction and motivation would be the main contributors to low productivity in a work-from-home environment, many organizations discovered that it was gaps in their IT systems that created some of the most significant challenges ...

January 17, 2023
The US aviation sector was struggling to return to normal following a nationwide ground stop imposed by Federal Aviation Administration (FAA) early Wednesday over a computer issue ...
January 13, 2023

APMdigest and leading IT research firm Enterprise Management Associates (EMA) are teaming up on the EMA-APMdigest Podcast, a new podcast focused on the latest technologies impacting IT Operations. In Episode 1, Dan Twing, President and COO of EMA, discusses Observability and Automation with Will Schoeppner, Research Director covering Application Performance Management and Business Intelligence at EMA ...

January 12, 2023

APMdigest is following up our list of 2023 Application Performance Management Predictions with predictions from industry experts about how the cloud will evolve in 2023 ...

January 11, 2023

As demand for digital services increases and distributed systems become more complex, organizations must collect and process a growing amount of observability data (logs, metrics, and traces). Site reliability engineers (SREs), developers, and security engineers use observability data to learn how their applications and environments are performing so they can successfully respond to issues and mitigate risk ...