It's been all over the news the last few months. After two fatal crashes, Boeing was forced to ground its 737. The doomed model is now undergoing extensive testing to get it back into service and production. You can almost cut the anticipation with a knife. Wall Street, the airline industry, future passengers and the manufacturer itself all want to be able to rest knowing that all Boeing planes are back on the market.
In the interim, the manufacturer has taken a very serious hit. Its stock price plummeted. Consumer safety concerns hit at an all-time low. And it all boils down to a series of software problems, and it will take new and improved updates to get the models back into the sky.
The airline/aerospace industry isn't the first or the last to come face-to-face with software flaws. It's pervasive. The big question is who's next? Automotive? Retail banking? All are plausible. This is a line that no one wants to be first in.
Why does it continue to happen? And more importantly, how can be it be avoided?
Large organizations often tell stakeholders that even though all software goes through extensive testing, this type of thing “just happens.” The old saying “to err is human” is the scapegoat. But that is exactly the problem. While the human component of application development and testing won't go away, it can be eased and supplemented by far more efficient and automated methods to proactively determine software health and identify flaws.
Gaining insight into software health lends itself to knowing how secure applications are. A recent Software Intelligence Report from CAST found 28% of businesses rely on “instinct” or their architects to assess potential IT risks. However, being in the blind about software robustness can leave organizations vulnerable, so they need to understand where the weaknesses are before it's too late, using Software Intelligence to find the biggest threats.
Just like a doctor doesn't diagnose a broken arm without an x-ray, a business shouldn't rely on human assessments alone to diagnose software issues.
Routine Checks, Spot Fixes and Physicals
The good news is with a few tweaks software health assessments can become much more effective and preventative. This can be achieved by breaking up your software health checks into three categories: routine checks, spot fixes and physicals. With this strategy, weaknesses can be detected quickly especially if the software is scanned on a regular basis. This will help identify and catch the biggest issues.
For routine checks, which should occur monthly, the focus should be on removing more defects than were added, and identifying the most common defects and asking, “do we know how to avoid the obvious flaws?” Identifying what a bad practice is helps teach developers not just about weaknesses but how to avoid them. In addition, change velocity should be relatively constant. Software releases with massive changes in functionality tend to cause concern. Defect density should also never slide up.
Spot fixes are frequent but can tell you a lot about a specific problem. Trouble tickets provided by customers or users can let you know specifics such as did it crash, was it slow, did it lockup? Knowing a specific pain and developing a plan to treat it will create real data that can improve metrics and identify issues such performance against the defects in a module or method, machine reboots caused by memory leaks or security breaches. In addition, this data can be combined with cost and hour data to develop a better prediction on staffing and usage.
Finally, the annual physical. Look for trends in key data from the same point each year. For example, was there an increase in complexity? Is the application getting harder to maintain? Has the defect density increased/decreased? Are the lines of code or number of transactions increasing? This can signify less experienced coders and increases the risk for potential defects.
Application maintenance is the responsibility of every IT department but understanding software health – whether it's secure, efficient, resilient – is the most vital aspect to ensuring that even a minor update, doesn't cause a ripple effect on the whole organization and generate unintended consequences, like what happened to Boeing.
Better software intelligence processes to determine health can pre-warn a business about risk and these three checkups should be a part of maintaining every application over time. All of the data should also be captured in a software health dashboard that tracks progress and can provide a quick glance at health in terms of robustness, efficiency, security, changeability, transferability and quality. A dashboard not only gives fast facts about the evolution of the software, but it also can give insights to where you are at highest risk and providing trending analysis to benchmark over time.
All developers should remember that it's impossible to retrofit stability and trust into an application. It has to be designed and engineered in, or the erosion sets in and your business can jump the queue and become the next Boeing.
In Episode 9, Sean McDermott, President, CEO and Founder of Windward Consulting Group, joins the AI+ITOPS Podcast to discuss how the pandemic has impacted IT and is driving the need for AIOps ...
Michael Olson on the AI+ITOPS Podcast: "I really see AIOps as being a core requirement for observability because it ... applies intelligence to your telemetry data and your incident data ... to potentially predict problems before they happen."
Enterprise ITOM and ITSM teams have been welcoming of AIOps, believing that it has the potential to deliver great value to them as their IT environments become more distributed, hybrid and complex. Not so with DevOps teams. It's safe to say they've kept AIOps at arm's length, because they don't think it's relevant nor useful for what they do. Instead, to manage the software code they develop and deploy, they've focused on observability ...
The post-pandemic environment has resulted in a major shift on where SREs will be located, with nearly 50% of SREs believing they will be working remotely post COVID-19, as compared to only 19% prior to the pandemic, according to the 2020 SRE Survey Report from Catchpoint and the DevOps Institute ...
All application traffic travels across the network. While application performance management tools can offer insight into how critical applications are functioning, they do not provide visibility into the broader network environment. In order to optimize application performance, you need a few key capabilities. Let's explore three steps that can help NetOps teams better support the critical applications upon which your business depends ...