Lusser's Law and Applicability
April 07, 2021

Terry Critchley
Author of "Making It in IT"

Share this

Availability Probabilities

Application availability depends on the availability of other elements in a system, for example, network, server, operating system and so on, which support the application. Concentrating solely on the availability of any one block will not produce optimum availability of the application for the end user.

In the following diagram, a "linear" or "non-redundant" configuration of elements supporting the user application is shown. In vendor engineering publications, these elements are referred to as "blocks," although other publications may refer to them as "components."

It is evident that if any block in the series configuration fails, the user loses the use of the application. The application is as available as the weakest link in the chain, or so it would appear.

The following figure is a schematic showing a linear chain of blocks, henceforth known as non-redundant blocks or blocks in series. It is easy to see that the failure of any block in the chain will cause a failure of the service to the end user.

There are, of course, other blocks in the chain, such as operating system, middleware and so on (not shown here) but the principle is the same. A single component whose failure causes overall failure is called a single point of failure or SpoF.

The equations above demonstrate that a series of blocks is weaker than its weakest component simply because of the multiplication of several factors, all of which are less than or equal to 1.

Lusser's Law

Lusser's Law is a prediction of reliability named after Robert Lusser (He worked on Wernher von Braun's US rocketry program post-WW2) . It states that the reliability of a series system (of our "blocks") is equal to the product of the reliability of its component subsystems, if their failure modes are known to be statistically independent. This is what we see in the above diagram. The law can be stated as follows:

This lays to rest the theory that a chain is as strong as its weakest link, the thinking at the time. Lusser's Law deals with the reality of this situation.

Next in this document we will discuss using components in parallel and how to make the assessment of availability more IT-specific and not just deal with anonymous blocks or components which might represent anything in a reliability context - valves, pipes etc.

Part of availability management is the examination of the service failure points in the configuration between application and user, assess their impact on service availability and design round them. Obviously there are cost implications to going over the top in design, especially in the cases below.

Effect of Redundant Blocks on Availability

The discussion so far has dealt with linear chains of blocks (blocks in series, to use an electrical analogy), where the whole chain is weaker than its weakest link – Lusser's Law.
To carry this analogy further, it is possible to use blocks in parallel to increase the availability of the chain, assuming that one block can take over from a failed block, assuming that the blocks fail independently. The following diagram illustrates this for two blocks:

These blocks might be NICs, disks, server or other parts of a working system. The general case for 'n' parallel blocks is shown below, together with the equations for availability and non-availability probabilities.

Parallel (Redundant) Components

The next figure illustrates components configured in parallel as opposed to in series as we have seen already. The mathematics of these configurations is similar to Lusser's mathematics except we deal with 'unreliability' instead of 'reliability' entities in the math.

The basic premise in these calculations is that if the probability of being available is P, then the probability of not being available is N where;

N = (1-P) and P = (1-N)

since the total availability of being available or not available is 1. In reality, a system will consist of several sets of redundant components, for example disks, servers, network card, lines and so on. These will feed into each other, possibly mixed with single components.

The figure below shows the general case of "n" blocks in a parallel configuration. This might represent one set of components for a subsystem such as a RAID configuration or set of network interface cards (NICs). Such a configuration can be difficult to handle mathematically so a 'reduction' technique is usually employed.

Two Parallel Blocks: Example

Picture two components in parallel, one with availability probability Pa and the other Pb. The probability of both blocks being unavailable, that is, the chain is broken, is:

This assumes the blocks have different availability characteristics, Pa and Pb..If they were the same, say Pa = Pb = P, then the probability that both are not available is given by the relationship:

which is essentially a variation of Lusser's Law using the non-availability probabilities as multipliers instead of availability probabilities. The probability that 'n' redundant blocks are unavailable is (1 - P)n and the probability that they are all available is given by the relationship [1- (1 - P)n ]

As an example, consider two parallel blocks, each with an availability of 99.5 %. The probability that both are unavailable is:

N = (1 - 0.995) x (1 - 0.9995) = 0.000025

Hence its availability (compared with the availability of a single non-redundant case of 99.5%) is:

A(%)=(1-N) x 100=(1-0.000025) x 100%=99.999975%

The knowledge of each value of "P" and some mathematical skills would be needed to solve the problem of service availability for a combination of serial and parallel service blocks, which is often the case in real life. The book High Availability IT Services covers this latter case.

Dr. Terry Critchley is an IT consultant and author who previously worked for IBM, Oracle and Sun Microsystems
Share this

The Latest

September 30, 2022

For businesses with vast and distributed computing infrastructures, one of the main objectives of IT and network operations is to locate the cause of a service condition that is having an impact. The more human resources are put into the task of gathering, processing, and finally visual monitoring the massive volumes of event and log data that serve as the main source of symptomatic indications for emerging crises, the closer the service is to the company's source of revenue ...

September 29, 2022

Our digital economy is intolerant of downtime. But consumers haven't just come to expect always-on digital apps and services. They also expect continuous innovation, new functionality and lightening fast response times. Organizations have taken note, investing heavily in teams and tools that supposedly increase uptime and free resources for innovation. But leaders have not realized this "throw money at the problem" approach to monitoring is burning through resources without much improvement in availability outcomes ...

September 28, 2022

Although 83% of businesses are concerned about a recession in 2023, B2B tech marketers can look forward to growth — 51% of organizations plan to increase IT budgets in 2023 vs. a narrow 6% that plan to reduce their spend, according to the 2023 State of IT report from Spiceworks Ziff Davis ...

September 27, 2022

Users have high expectations around applications — quick loading times, look and feel visually advanced, with feature-rich content, video streaming, and multimedia capabilities — all of these devour network bandwidth. With millions of users accessing applications and mobile apps from multiple devices, most companies today generate seemingly unmanageable volumes of data and traffic on their networks ...

September 26, 2022

In Italy, it is customary to treat wine as part of the meal ... Too often, testing is treated with the same reverence as the post-meal task of loading the dishwasher, when it should be treated like an elegant wine pairing ...

September 23, 2022

In order to properly sort through all monitoring noise and identify true problems, their causes, and to prioritize them for response by the IT team, they have created and built a revolutionary new system using a meta-cognitive model ...

September 22, 2022

As we shift further into a digital-first world, where having a reliable online experience becomes more essential, Site Reliability Engineers remain in-demand among organizations of all sizes ... This diverse set of skills and values can be difficult to interview for. In this blog, we'll get you started with some example questions and processes to find your ideal SRE ...

September 21, 2022

US government agencies are bringing more of their employees back into the office and implementing hybrid work schedules, but federal workers are worried that their agencies' IT architectures aren't built to handle the "new normal." They fear that the reactive, manual methods used by the current systems in dealing with user, IT architecture and application problems will degrade the user experience and negatively affect productivity. In fact, according to a recent survey, many federal employees are concerned that they won't work as effectively back in the office as they did at home ...

September 20, 2022

Users today expect a seamless, uninterrupted experience when interacting with their web and mobile apps. Their expectations have continued to grow in tandem with their appetite for new features and consistent updates. Mobile apps have responded by increasing their release cadence by up to 40%, releasing a new full version of their app every 4-5 days, as determined in this year's SmartBear State of Software Quality | Application Stability Index report ...

September 19, 2022

In this second part of the blog series, we look at how adopting AIOps capabilities can drive business value for an organization ...