It’s popular to use Crow-AMSAA for system reliability modelling. However, for the model to represent reality it must include only historically identical events
If your Crow-AMSAA reliability growth model mixes nonidentical situations, then its forecasts have no meaning
Below is an email discussion about when Crow-AMSAA modelling can provide believable forecasts.
Dear Mr. Sondalini,
I have read you recent article on Crow-AMSAA Model in the latest Maintenance & Asset Management magazine with great interest (See Reliability Growth Model of a Safety System). I have also previously came across some of your work where I am grateful for your Excel document on how to develop one’s own Weibull analysis, which I found it to be very helpful to understand how to plot failures without using any sophisticated software.
I am interested in learning from failures and major disasters using root cause analysis techniques and have written a book on this subject, https://www.elsevier.com/books/learning-from-failures/labib/978-0-12-416727-8
I have a couple of questions about using reliability growth modelling and would welcome your kind advice please.
1. Is it safe to assume that reliability growth modelling in general, and Crow-AMSAA model in particular, is an attempt to mathematically represent the P-F curve, (which incidentally, was also touched upon in the article by Dunn in the same issue of the magazine) except that the P-F Curve is just upside down?
2. I have written a paper jointly with John Harris on analyzing the Fukushima nuclear power disaster (the paper is attached in case it is of interest). In that paper we suggested the following:
“…To test for a probability of a less than one in ten million chance per reactor-year of a nuclear plant failure would require building 1000 reactors and operating them for 10,000 years and anticipating a failure of no more than one during that period. Now let us compare these ambitious estimates with the current state. Across the world there are about 435 nuclear power reactors operating in 30 countries, with over 140 in Europe, and 54 in Japan [36,37], and around 100 in the USA. Fukushima is the third major nuclear accident (i.e. it was preceded by Three Mile Island and Chernobyl) and all three happened within less than half a century, which makes us question our models and original assumptions. So the current record suggests that the Mean Time between Failures (MTBF) for the three major accidents (in 1979, 1986, and 2011) currently stands at just 10 years, which is very far from the ambitious 1 in a 10 million per reactor-year chance. This view is supported by Smythe , who also suggests a catastrophic accident to be expected every 12–15 years.
Clearly, the three accidents each arose from very different circumstances, invalidating various modelling and risk assessment assumptions, and resisting assimilation into a single data set. It is difficult, with such a small sample size, to make generalizations about where current risk models fail, though we agree with the argument put forward by Pfotenhauer et al.  which suggests that the original ambitious annual failure risk estimates were serious underestimates.”
Can one apply Crow-AMSAA model to the nuclear disasters, and what sort of insights would it lead to?
I look forward to hearing from you, and appologies for the long email.
Thank you for your email and its interesting contents.
Like you, I too am deeply interested in preventing failures. If you want to create world class reliability, then failure elimination and prevention is a fundamental requirement.
My interest in Crow-AMSAA models was tweaked by Paul Barringer’s white paper, http://www.barringer1.com/nov02prb.htm, which I found on his website some years ago (I think that he has posted other articles on the topic as well). I then went on a search for more information about Crow-AMSAA to educate myself in it. My understanding of Crow-AMSAA is that of an interested learner.
A big presumption with Crow-AMSAA is that its mathematics can faithfully model reality. Both axes on Crow-AMSAA graphs are log-to-the-base-10 (log10) scale. Statisticians have long known that graphs of human-caused events plot as straight lines in log10 – log10 graphs. When people cause a specific event, Crow-AMSAA seemingly reflects the frequency of the outcomes.
With regards to the first of your two questions. From my understanding, Crow-AMSAA would apply to the behaviour of the system of parts that we call ‘machines’. It would also apply the behaviour of the system of machines and equipment that we call ‘operating plants’. I have never thought Crow-AMSAA would work for individual failure modes. That is where Weibull Analysis finds its use.
I think your question on whether a Crow-AMSAA reliability model will replicate equipment service behaviour is an affirmative if the complete equipment is considered as one “system” under the control of people. You would never know which failure mode would cause the next failure, but you know from history that the equipment will fail.
A Crow-AMSAA reliability model needs identical situations to use in its development. You cannot put a centrifugal pumps set into the same model as a reciprocating compressor. The two items of equipment are totally different engineering, operation, and materials.
Crow-AMSAA plotting takes historic failures and projects forward on the presumption the future failures follow the same pattern. This is another big danger with using Crow-AMSAA—the assumption that the future will be the same as the past. Once you fix failure modes in an equipment item you change the future. From that point onward the asset’s future failures are no longer predictable from historic data.
The P-F Curve is a simplistic model to explain how equipment degradation eventually becomes equipment breakdown. It is not a truthful model of operating equipment reality. There is a P-F curve for every failure mode of every component. In a simple machine or equipment with dozens of parts, like a centrifugal pump set, each part will have several failure modes and consequential P-F curves. A centrifugal pump set would have about 100 P-F curves happening simultaneously. In equipment with thousands of parts there can be a hundred thousand P-F curves in interplay.
In regards to the second question on modelling the nuclear power industry. I also would not consider using Crow-AMSAA across multiple different organisations in different countries. To me they are not a continuous, holistic system. Though, if the nuclear power industry around the world is totally regimented and standardised in the way they design, construct and operate types of nuclear power stations, then the entire global industry, and all the equipment, processes and people in it, could be considered as a life-cycle system. Being fundamentally identical situations, you could argue that Crow-AMSAA is a viable analysis for the same power station type.
As with different equipment, you would not have a gas fired power station in the same Crow-AMSAA model as for a nuclear power plant. Each power station generates with different engineering technology, materials, and operating methods.
I also struggle to see it as fair-and-reasonable to include Fukushima into a Crow-AMSAA model of the only three nuclear power station incidents so far, when its failure was the result of natural events (a tsunami), whereas the Three Mile Island and Chernobyl incidents were man-made and not associated with natural ‘Acts of God.’
If a Crow-AMSAA forecast predicted the dates of future nuclear disasters, though, unfortunately, not where they will happen, it may be useful ammunition to instigate proactive, preventive changes. For the sake of interest, it may be worth your time to plot the first two nuclear incidents to see what date the third one was projected to happen. If the third event, Fukushima, does fall on the Crow-AMSAA forecast, you may have a case to justify predicting the fourth, yet to happen, nuclear power station incident.
I hope that the above is of some use to you.
All the very best to you,
LRS Consultants Global