This story of finding and correcting repetitive SR-71 Blackbird spy plane engine failures is a marvel in the use of Weibull analysis. But behind the joy of successful Root Cause Analysis is a story of horrendous business losses.
The story of discovering the causes of SR-71 Blackbird engine failures is a fascinating tale about the power of Weibull reliability analysis. You can read it in the articles headed ‘Reliability of a Turbofan Jet Engine’ by Larry Tyson from his time working for the US Navy Fleet at http://www.nxtbook.com/nxtbooks/reliabilityweb/uptime_20110607/index.php?startid=48 and http://www.nxtbook.com/nxtbooks/reliabilityweb/uptime_20121011/index.php?startid=40. Just copy and paste each web address individually into your Internet browser.
Briefly, the articles tell how just after the turn of the 21st Century the SR-71 Blackbird spy plane had problems with jet engine flameout. The number of flights aborted from flameouts was so great that a team of reliability experts was formed to find and fix the problem. They used reliability engineering Weibull analysis techniques to identify interdependence of failure modes. The types of failure modes to address were categorized and the starting 168 failure modes were reduced to the 7 to be investigated. Two of those were a form of seal leak causing 47% of forced outages. The re-use of degraded old seals was identified as the root cause. The service procedure was changed from on-condition replacement of seals based on the maintainer’s observation during depot-level maintenance, to mandated preventive renewal of the seals. For a negligible cost of several new $25 seals the failure trends reversed and flight times between repairs began increasing.
There are not many organisations that could solve reliability problems using the reliability analysis approach used for reliability improvement of the Blackbird turbo jet engines (shown below). Weibull analysis was applied in an extraordinary way by world experts in the application of Weibull analysis. As successful as the project was few others in the world are likely to use the technique. Either because it costs too much, or no expertise is available, or they have not the extent and depth of failure data kept by the US Navy. But there is a simpler, risk-based concept which everyone can use to arrive at the best maintenance strategy choice to make for their operation.
A Weibull analysis does not cause reliability growth. The Weibull analysis did not make a Blackbird jet engine more reliable. Higher reliability resulted from fitting new seals instead of reinstalling old seals. The practice of replacing old parts with new is known as Preventive Maintenance. The Preventive Maintenance replacement of old seals for new during a regular maintenance service caused the Blackbird jet engine reliability improvement. It’s surprising how often Preventive Maintenance is the best business choice even if Predictive Maintenance using condition monitoring is recommended by the Reliability Centered Maintenance (RCM) analysis.
It’s a good bet that the original decision to replace seals ‘on-condition’ was the result of a Reliability Centred Maintenance analysis. RCM is a poor maintenance strategy selection methodology. It does not require you to identify the most economic choice for your company. All you have to do is answer its inherent questions and out pops an answer that satisfies the RCM methodology. It has no robustness. There is no certainty the answer is the best right decision. It has no self-correction mechanisms. You can be totally wrong and believe it’s a great maintenance strategy. Your choices can even send an organization broke, but RCM does not care a damn if you send your company into bankruptcy! This was not so when RCM was proposed by Nolan and Heap in their 1978 report titled, Reliability Centered Maintenance. They stipulated economic maintenance choices based on factual data. But RCM has been ‘blitzed’ so much to reduce the effort needed to do RCM properly that good business sense has been lost from the analysis.
Maintenance is an economic decision. Do the maintenance that makes the most money for your company, or at least loses the least money. Let the least business risk decide the maintenance to do. The right economic maintenance choice was to replace seals with new every time the service was performed. But that is not what was chosen. They chose to use on-condition monitoring dependent on human beings making a choice between a new seal or reuse of the old seal. As soon as you allow humans to make decisions based on opinion you guarantee there will also be wrong decisions. If the risk from a $25 seal failure is an occasional loss of a spy plane worth hundreds of millions of dollars then the business risk is too great. But in order to see that it needed a financial risk analysis to be done.
To minimise the chance of RCM losing you a lot of money you need to compare the business risk of your options. When picking maintenance strategy using RCM you also have to see what business risks result from the choice you make. Had someone in the RCM team asked, “What is the consequential cost from a fuel control circuit seal failure?” The answer would have been the possible loss of the entire plane, or the aborting of a mission which may then lose the battle and the war. Choosing on-condition seal replacement with its certainty of causing occasional fuel circuit failures was guaranteed to cost millions of dollars, and maybe even billions. Had the RCM team modelled the replacement of several $25 seals new every time the circuit was serviced it would have been clear the obvious RCM answer was the wrong business decision to take.
The fantastic benefit of Preventive Maintenance is its ability to greatly reduce business risk. Predictive Maintenance, which by its nature of waiting to detect evidence of failure, thereby puts your business at greater risk of failure than Preventive Maintenance. You can see in the image below of a rolling bearing degradation curve and the available maintenance strategy options, that with Preventive Maintenance you replace components before they have a negligible chance to fail. Whereas with Predictive Maintenance you wait until components have started to fail before you plan and schedule their replacement. Your only protection against a catastrophe when using Predictive Maintenance strategy and condition monitoring techniques is how well you control the rate of degradation.
People that minimise maintenance costs instead of minimising total business costs have got enterprise asset management and maintenance management totally wrong. The most successful operations are those that spend maintenance cost and effort to reduce the number and size of their operating and business risks. Don’t be scared to increase your maintenance expenditure to prevent failures if the extra spend clearly reduces your business risk by substantially more than the extra maintenance costs.
Whenever I see businesses making maintenance strategy decisions without detailed before-and-after risk analysis using economic modelling it makes me want to hang my head and cry. How can anyone know what the best decision is for the business if they don’t also look at the total business-wide risks in choosing the decision? The least maintenance cost choice maybe obvious, but is it also the best business choice? Clearly on-condition replacement of SR-71 Blackbird fuel circuit seals was not the smart business choice. That choice by the US Navy RCM team cost the organisation millions of dollars. There aren’t that many operations in the world that can survive that sort of loss.
How about you? Are you making the best business choices when you pick your maintenance strategy? The only way you will ever truly know is if your business risk financial model proves it is the best choice. Without doing a financial model of the business risk from failure of your maintenance strategy options you’re just guessing and hoping for the right results. Good luck with that. Be sure you can afford to be wrong.
All the best to you,
Lifetime Reliability Solutions HQ
P.S. This situation is much the same as the fable of how the kingdom was lost because of a missing horse shoe nail. It goes like this.
The story is told that in days of old before an important battle, a king sent his horse with a groomsman to the blacksmith for shoeing. But the blacksmith had used all the nails shoeing the king’s knight’s horses for combat and was one nail short. He warned the groomsman that the missing nail might allow the shoe to come off. The groomsman told the blacksmith to do as good a job as he could. The following morning the king rode into battle not knowing of the missing horseshoe nail. In the midst of the battle he charged toward the enemy. As he approached them the horseshoe came off the horse’s hoof causing it to stumble and the king fell to the ground. The enemy was quickly onto him and killed him. The king’s troops saw his death; gave up the fight and retreated. The enemy surged onto the city and captured the kingdom. The entire kingdom was lost because of a missing horseshoe nail.
The cost you can pay for a failure in a business is very often far above the cost of preventing the failure. Only a financial risk analysis will ever show you that.
P.P.S. If you want a super-quick way to do economic risk modelling simply use your company’s risk matrix. You may need to turn your 5×5 risk matrix into a 16×13 risk matrix, like the one below, so you can show small financial movements. Because the horizontal Consequence axis measures money you have got a business risk financial modelling tool. The money you save your company from using better maintenance strategy, ideas and solutions is the area of the triangle as shown on the risk matrix.