Business Risks Also Come from Your Operating Assets, and Operating Plant and Equipment Risk You Eliminate with Reliability
Operating plant and equipment risk of failure is present all the time, but you can never predict with total certainty the day, hour and minute a risk will become your next equipment failure
Equipment failures result when an equipment risk goes through to completion. How equipment risk arises and how reliability is used to prevent and control equipment risk is introduced and explained in this set of slides.
PEW/PWW EAM Course Day 1 – Foundations Session 3 – Risk | What is the Chance of Failure?
Roll 5 dice ten times and record how many times the side with one (1) is on top for each roll. Repeat ten times. Tally the total times ‘1’ was on top in 50 dice rolls.
The chance that ‘1’ appears in the roll of a single dice is 1 in 6, i.e. 1/6 = 0.167 probability. What is the chance for a result of ‘1’ if you did 50 rolls? In the slide above it was 8/50 = 0.16, i.e. 1 in 6.25. But for each individual role the chance varied. From the first to the last role the chance result was 0, 0.4, 0.167, 0.4, 0.167, 0.167, 0, 0.167, 0, 0 respectively. In a large number of events the average chance does not accurately represent the individual event chance.
A major purpose of this activity is to get attendees to explore randomness in some detail so they get to better understand its behaviour and its influence on failure events.
The first question to ask: ‘What is the probability a 1 will present in a throw compared with how many times ‘1’ resulted in the 50 dice rolls?’
The correct analysis is to use the Bernoulli trial formula. It indicates that with 50 dice rolls you could expect the number ‘1’ to show 6 to 7 times. Number ‘1’ showed 8 times in the slide. In response to the above question, even though you expect ‘1’ to appear at six to seven times in 50 throws, which throws in the 50 would deliver a ‘1’ you can never know. You could not say that ‘1’ would have appeared on a specific dice in throws 2, 3, 4, 5, 6 and 8. If ‘1’ represents an equipment failure event, you know that ‘1’s will occurred because they are present on the dice, but at which time (i.e. roll) they appear is impossible to know with certainty. So long as you have plant and equipment risk of failure you are sure that failures will arise, but you don’t know when they will occur.
The second question to ask: ‘What strategy must you adopt to maximize the numbers of ‘1’ that occur?’
The only way to maximise the desired outcome in a situation where random events occur is to intentionally influence the outcome. You need to ‘load the dice’ in favour of ‘1’ if you want ‘1’ to appear often, or place the dice with 1 upwards. To prevent randomness you must instill certainty.
The third question to ask: ‘What comments can you make on the insights from the activity?’
Comments people make covered issues such as the high uncertainty of prediction if using historic random failure events; the uncertainty of comparing reliability of identical equipment when used in different situations; and the danger of using averages to indicate point-in-time performance.
PEW SOLUTION: Reduce the Chance of Failure
The Risks You Live With and Those You Prevent Show Your Risk Boundary
Each operation can identify its risk boundaries for its production plant once it knows their Defect and Failure Total (DAFT) Costs of its equipment failures. The chance of a equipment failure is determined from the equipment history in the CMMS, or from industry expectations and experience.
In the slide we have set a DAFT Costs limit of $10,000 per time period (usually a year). That means we will not accept any failures that cause us to spend more than $10,000 a year on that piece of equipment. To prevent spending more than that much money we must introduce risk prevention strategies to limit our risk to $10,000 per period. This approach forces us to look seriously at what is causing the risk and to develop solution to limit and control it.
The ‘bent’ line at the top of the ‘Accept’ area is there because we have limited risk to $10,000 for the whole time period, regardless of what causes the failure and how expensive it ends up becoming. Since ‘Risk = Chance x Consequence’, it means that for the Consequence to stay at $10,000 we have to change the Chance of a failure event happening. An example is when the DAFT Cost is say $100,000 we must reduce the Chance of the event happening to 0.1 (i.e. 10%) of a $10,000 cost event happening. In that case ‘Risk = $100,000 x 0.1 = $10,000’ and we are still at our acceptance boundary.
You can also look at the risk boundary in another way. A more complete version of the risk equation is:
‘Risk = Consequence x Number of Events x Chance of Event’
With risk in this form you can see that to keep to $10,000 a year total, you cannot have a $100,000 failure more than once in every 10 years (Risk = $100,000 x 0.1 x 1 = $10,000).
Acceptable Equipment Failure Domain
The equipment failure domain is set by the cost of a failure event and the frequency you will accept it. If you set a $10,000 per year limit as your equipment risk boundary, then that value can be reached in many ways.
The full risk equation is: Risk $/yr = $10,000/yr = Consequence from Failure x Opportunities for Failure x Chance of Failure. You now have three variables in play with limitless combinations that satisfy the equation.
The shaded volume in the slide is when the consequence of equipment failure is set at $10,000 and the opportunities and chance vary. The red dotted line is if all three variables change. It tells us that we will accept a $1M event if it only has a 10 percent chance of happening once in ten years. That is still equivalent to $10,000 per year.
If you do not want to have a $1,000,000 DAFT Cost failure what are you going to do to prevent it from happening?
The crazy thing would be to live with the equipment risk of a single $1M event if it will bankrupt the business. Though the mathematics says $10,000/yr is equal to 10% of $1M spent equally over ten years. The fact is that though $10,000/yr is manageable to a business, a $1M equipment failure event would destroy it. In reality your tolerance for a $1M event is NEVER if that equipment risk event will ruin you. We cannot make our risk choices by mathematics alone; we must make them on what we can afford to lose!
Once it is decided how much money an organization is prepared to lose from failures of an item of plant, they can plot their failure domain and clearly see what their tolerance is for problems on that piece of equipment.
With the ‘failure volume’ decided, i.e. how much money you are willing to lose to failure, defects, errors, waste and loss, you have the basis on which to make economic decisions of how to control your risk. Maintenance is a one of the risk management strategies you can choose for controlling operating and business risk.
For the shaded volume in the slide we can see that people have set a $10,000 DAFT Cost per period boundary (say a year). That means 1 repair a year due to failure worth $10,000 (which makes $10,000 in DAFT Costs), or two chances of a $5,000 failure a year if there is 50% chance of either one happening, or ten failures a year at 10% chance of any one happening—they all total to a risk of $10,000/year.
To stay within the acceptable risk boundary for the time period you must put into place the risk mitigation and prevention practices that will reduce the risk. At any time, and all the time during the period, you need to stay within the risk limit you are willing to carry on each piece of plant.
Risk can be Calculated and Plotted
Risk is a power law (that means its effects can vary to extremes unpredictably) and the same level of risk can be arrived at in an infinite number of ways. Risk that is of low consequence, but happens often, is just as costly as those that happen very occasionally, but are expensive when they do. Neither situation is acceptable and they must be removed if you want to minimize disruptions to production.
Risk using Log10 Chance and Consequence
CORRECTION TO EQUATION: The slide has an error in the log calculation of risk. The correct equation is Log Risk = Log Consequence + Log Frequency.
This Figure shows a log-log graph of risk. When plotted on log-log axes risk forms straight lines on the plot. That a power law is a straight line on a log-log plot means that randomness exists in the behaviour of the influencing factors. A lot of human activities plot straight on log-log plots.
Superimposed in the plot is a risk matrix that uses colour to indicate the severity of risk depending on the cost of the problem and the number of times it happens. This is how risk matrices are developed. Notice how the ‘red’ cell is at the top, right of the matrix.
What a Log-Log Risk Scale Means
The slide shows a typical risk matrix used in industry. Notice how the high risk portion, which was a small part in the log-log plot, has become a large part of the risk matrix. This is the effect of converting risk, which is power law, back into a linear scale. We must be very careful when using the standard risk matrix that we do not make everything into a high risk just because it occupies a large part of the matrix. We must realise that it is unrealistic that all risky situations have a high risk. In reality high risk is the exception, rather than the rule.
– Each threat or escalation barrier can be represented as a piece of Swiss cheese (hence why the name of the Swiss-cheese risk model)
– The holes represent weaknesses in the processes that form part of the barrier. The weakness can relate to the design of the process or its implementation.
– If the holes in the threat barriers line up this forms the chain of events that lead from a hazard to an event.
– If the holes in the escalation barriers line up this forms the chain of events that leads from an event into a consequence.
This explains why often bad things happen but they do not automatically end in catastrophe. It takes a number of things to go wrong at the same time (i.e. the holes in the Swiss cheese line-up) before a disaster happens. But when it does, then the consequences can be life-ending.
The matrix also asks another question of us: Is it better to spend a lot of money to fix one large risk, or to spend the same money and fix many small risks? If many small risks can be removed, the result will be fewer annoying little problems to overload us, and take our attention away from controlling the large risks. With the small risks gone we can better manage the remaining large risks. In addition, with many small risks gone the probability (chance) of a small problem contributing to a larger problem also falls, and you have even fewer large problems.
Want ALARP – As Low As Reasonably Practicable
Risk can never be reduced to zero. But it can be reduced to the chance of occurrence that is acceptable for the situation.
PEW SOLUTION: Asset Engineering, Operations and Maintenance that Reduces Life Cycle Operating Risk
PEW SOLUTION: Use a Process to Create Reliability by Reducing the Chance of Machine Component Failure
Identifying Risks on a Standard Risk Matrix
When the risk management process is applied a risk rating scale is developed to assess the size of a risk. Such a scale can be used to measure the impact on a business of an equipment failure. The greater the impact from failure and downtime the more that must be done to prevent the equipment risk or reduce its consequence.
This is the approach used to identify equipment criticality. The criticality justifies adoption of suitable failure prevention practices and necessary maintenance strategies. It produces a priority scale to care for equipment, with equipment of the highest importance getting highest protection and response.
By applying an equipment criticality rating to plant and equipment it provides guidance on the importance of installing protective measures and making available emergency recovery strategies from a failure.
PEW SOLUTION: Tracking Risk Matrix Used to Prove Asset Operating Risk Reduction
Use the Risk Matrix to ‘SEE’ the Equipment Risk and Understand Its Implications
Running Operating Equipment using Plant Wellness Way EAM Strategies #1
These last two slides are replies to questions asked about Session 3 content after Session 4 had started. The answers cover questions about using Plant Wellness Way Enterprise Asset Management philosophies to operate plant and equipment.
Running Operating Equipment using Plant Wellness Way EAM Strategies #2
The new Industrial and Manufacturing Wellness book contains all the latest information, all the latest templates, and worked examples of how to design and build a Plant Wellness Way Enterprise Asset Management (PWWEAM) system-of-reliability. Get the book from its publisher, Industrial Press, and Amazon Books.
The PLANT WELLNESS WAY EAM TRAINING COURSE teaches you to use and master the Plant Wellness Way EAM methodology. Follow this link to read about Training for New Users in the Plant Wellness Way EAM Methodology for World Class Reliability.
You are welcome to go to the Plant Wellness Way Tutorials webpage and look at worked examples of Plant Wellness Way EAM techniques and read in-depth explanations of the latest version of many PWWEAM presentation slides.
Use the head office email address on the Contact Us page if you have questions about the videos.