Plant and Equipment Wellness EAM Training Course Videos Session 10: Risk Rating for Equipment Criticality

Equipment Risk Rating is known as Equipment Criticality Analysis

You will be able to calculate a financial value for risk to each piece of equipment in your organisation and identify if the criticality risk rating is Acceptable, Low, Medium, High, or Extreme.

 

IONICS PROCESS 2 is an equipment criticality analysis where you do a business and operational risk rating of an asset. Risk = consequence x likelihood. The consequence is the total of all resulting operations and business losses because of the failure event, plus the total of all costs to return the equipment to operation after failure. Likelihood is the chance that failure will occur. Each asset will have their own value of consequence, because the cost of the repair depends on asset’s design and location, which is specific to each equipment item. And each asset will have their own value for likelihood, again because each equipment’s design, location, and means of operation is unique.

 

PEW/PWW EAM Course Day 2 – Plant Wellness Way Processes Session 10 – Risk Rating
Duration 3:55

Uncertain Component Degradation Rates mean Uncertain Equipment Failure Dates and Costs | Uncertain Operating Life Remaining with Business-Wide Costs and Losses = RISK DECISIONS | Recognising the Extent of Your Risks | The Risk in Rescheduling Maintenance Work
Duration 8:13

The slide shows the concept of the P-F curve, or degradation curve, made popular by the late John Moubray in his RCM II books to explain predictive maintenance and when to do condition monitoring.

The P-F interval is the time between when a budding failure identifies itself to us (the Potential failure point ‘P’) and when we can no longer use the equipment because its performance degrades to an unacceptable level (the Functional failure point ‘F’). An example would be a pump designed for a specific duty in which the impeller wears until the pump cannot deliver the necessary minimum flow. When a lower flow is first noted it is the ‘P’ point and when the impeller cannot deliver adequate flow it is the ‘F’ point. The pump still operates and it has not broken down, but it is not meeting its minimum functional duty.
 
The concept of the P-F curve applies to every part in a machine. For a machine with ten working parts every part has its own P-F curve. Condition monitoring is used to observe the ‘P’ point and identify an impending failure. Normally only the vital parts that lead to a breakdown are placed under observation. The ‘P-F’ interval is typically selected based on the worst case failure suffered on-site with the equipment item, or by using the failure history from other comparable operations, or by making a reliability failure assessment of the item.

The location of the ‘P’ point on the curve is strictly a function of the condition monitoring (CM) used–we always show it before ‘F’ as this is what we want so we have time to react. But in practice if we have no CM in place it may not be until the breakdown that we know we have a failure. The location of the ‘F’ is strictly a function of your definition of loss of functional capacity as noted in Figure 3.6 on page 51 of Moubray (RCM, second edition).

We can describe the different regions of the curve:
From new to P – we believe the part is performing well and to design requirements
From new to F – performance is actually better than minimum operating specification
From new to B/D – is the destined life (at some level of functioning capacity) if you do nothing
The other curve point is where you take action to get off the curve and actually do the repair or replacement.
 
By putting ‘F’ before B/D on the curve you imply that there is an opportunity to operate at below the defined functional capacity before the item is totally failed.
 
The P-F interval is probabilistic and varies depending on the level of stresses carried by components and the number of over-stress incidents suffered by a part. Where there are many causes of high stress there are many chances to fail. With a properly set-up pump on water service operated properly the stress effects causing wear on the impeller accumulate slowly and the P-F interval for the impeller will be decades, whereas for a pump impeller on slurry service continually in contact with abrasive material the P-F interval maybe months. The water pump impeller might be condition monitored every five years, but the slurry pump impeller would be monitored every fortnight.

If you define ‘F’ as the point where degradation falls below specification then during the interval from ‘F’ to ‘B/D” the item is still usable but you will need to take account of the fact that you are below spec. You may operate here depending on the economics of the situation, i.e. happy to limp along – e.g. operate plant at below full capacity and limp along to complete a production run. OR sacrifice an item for the greater good e.g. to sacrifice damage to one item to prevent unscheduled plant outage.
 
To making rational business decisions we need to understand the comparative costs. Without knowing those costs our decisions cannot be rational but are guesswork, so it is vital that we accurately determine the costs of the options available to us.

Should a breakdown occur there will be resulting business-wide costs from the failure. If the equipment is a stand-alone duty item the cost of failure will be the maintenance cost + production costs + production losses + all other business-wide losses, which maybe a vast quantity. If there is a standby equipment item that comes into operation then the cost of failure will be the maintenance cost + production costs + production losses, which substantially minimises the impact of the original breakdown.

Once a failure event has started we need to decide when to rectify the problem(s) with the equipment for the least production disruption and the least maintenance costs. Because the rectification decision is a business risk decision we can map the situation onto a risk matrix.

In the scenario shown in the slide a work order was raised when the evidence of failure initiation was first detected. From the combined knowledge and experience available in the operation it was agreed that under the required operating regime the equipment would become unusable in about a month time. There was also recognition that in the worst case the equipment could become inoperable within a week.

Since there is uncertainty when the event will actually occur we generate an ‘envelope’ of the range of business risk that exists because of the situation. The location of the risk envelope on the matrix warns us of the seriousness of the situation.

Point ‘B’ is the intersection of the calculated business-wide cost of a breakdown should it happen and the worst scenario time before the equipment is unusable. Point ‘1’ is the intersection of the least rectification cost option and the expected time before the equipment becomes unusable (it is as good an outcome as we could expect in the situation).

Once the size of the risk envelop is identified matching actions can be taken to address the situation. In this case it is clear that the work order is vital and must be done as soon as possible. It is already a bad situation for the operation to be in because even the least cost rectification is a high cost to the business. To wait for more than a week to do the work order will surely increase the chance of a very expensive breakdown.

When work orders are rescheduled it is important to appreciate that the risk of failure increases. In the scenario shown on the screen the work order first raised when the failure initiation was detected was not completed when planned and now needs to be rescheduled. While the rectification work remains undone the chance of failure keeps rising since the equipment is still in service receiving stresses on its failure prone part(s). The business risk continually moves towards the worst catastrophe. The longer the rectification is delayed the more certain it becomes that there will be the unwanted breakdown we are trying to prevent.

Points ‘B’ and ‘1’ are identified as noted in the previous slide. At Point ‘2’ two weeks have passed and two weeks remain to the expected failure. The work order is still a ‘planned’ job but the chance of failure has risen. Point ‘2’ is the intersection of the rising likelihood of failure and the cost of planned rectification. If we misjudge and the equipment breaks down the event goes from a $50K job to a $300K job; the scheduling decision cost the business $250K.

Use a Risk Matrix to Show Impact of Choices
Duration 4:30

Putting your risk boundary onto a risk matrix turns a difficult concept like risk, which involves ever-changing chance and consequences, into a simple visual representation of the current risk situation from a failure scenario in a company.
 
In this slide the conveyor return roller failed long ago and now the conveyor belt running over it is wearing away the tube wall at the right hand side of the roller. Once that happens the edge of the hole that appears in the tube becomes a knife edge. The knife edge is always in contact with the moving belt. Once the knife edge appears it creates an opportunity for the belt to be ripped its full length. As the hole gets bigger in the tube it grows both circumferentially and toward the centre of the roller. The opportunity to catch the underside of the belt with the knife edge and rip it full length continually rises. A ripped belt would lose the company $200,000 DAFT Cost.
 
But much worse than a ripped belt is the possibility for the knife edge to become a peeler and scrape the rubber belt into a large volume of rubber shavings. The thin rubber shavings are taken by the moving belt to the conveyor drive where they build-up around the motor. As the motor gets hotter and hotter from lack of ventilation the rubber shavings catch fire and the entire conveyor system and its drive is completely burnt. To replace the damage of a conveyor system fire would be $2,000,000 DAFT Cost.
 
The consequence and chance of each scenario is easily plotted on the risk matrix. From doing regular maintenance for $1,000 per year, to the $12,000 cost to replace a failed roller, to the $200,000 loss of a ripped belt and finally the $2,000,000 rebuild of a burnt system the risk situation is clear to see on the matrix. It is now up to Production and Maintenance to decide how to handle the risk.

Nothing is Certain with Risk; It Changes Unless it is Controlled
Duration 5:47

Plot Current Operational Risk on the Matrix
Duration 2:55

Activity : Will these PM tasks prevent a failure?
Duration 1:39

Activity – How sure are you that a maintenance task is truly effective in preventing the equipment failure?
Duration 12:02

Question and Answer to “How sure are you that a maintenance task is truly effective in preventing the equipment failure?”
Duration 2:14

Risk Based Operating Strategy
Duration 7:17

Once a failure has started we must address it. Making the right choices to prevent an initiated failure event from becoming a breakdown is greatly helped by the use of a risk matrix.

But the better operating strategy is to use the methods and practices that remove the chance of parts failing and will prevent situations arising that jeopardises the operating life of the equipment.

Case Study – Use a Risk Cost Calculator to Understand Impacts of Risk Management Options
Duration 5:16

The slide shows a risk cost calculator used to highlight the consequences of delaying a maintenance intervention. It helps people to realise the business risk in a situation and to prioritise their risk management/mitigation activities. It can be developed to include whatever business costs one cares to separately identify when estimating the business-wide costs of a failure.

Classical Risk Analysis Method
Duration 0:56

To Gauge Risk We need to Measure and See It
Duration 0:42

Understanding the size of a risk is useful for deciding what to do about it. If the size of a risk is unacceptable it can be reduced by minimising the consequential cost, should it happen, or by reducing the chance of it happening so it is less likely to occur. Either approach will result in a lower risk.

Risk is the result of a few interactions, amongst numerous, which do not go as was intended. What causes the unintended interactions, and what then allows them to become disastrous, is the reason risk management has become an important tool in controlling defects and failures.

Realising that risk arises from the few exceptions allows us to build protection into our systems that identify when risk is high or extreme, and hence to take special care and precautions against the unwanted occurrence.

The risk formula is a power law. Power laws have particular properties. For example they are ‘scale-free’. In the case of risk this means the risk equation applies to every size of risk in every situation. They are ‘typically a signature of some process governed by strong interaction between the ‘decision-making’ agents in the system’. This implies that risk does not arise entirely randomly; rather it is affected by the ‘decision-makers’ present in a system. Situations that follow power laws have a higher number of large events occurring than those of a normal distribution. For risk this means that catastrophic events will occur more often than by pure chance. In power-law-mirrored events a few factors have huge impacts while all the numerous rest have little effect. For risk this means there are a few key factors that influence the likelihood of catastrophe. Control these few factors and you increase the chance of success. They are known as the critical success factors. You identify them by asking, “What affects the ability to meet the objective?”

Identify What Risks You WILL NOT Carry
Duration 0:28

The Risk Matrix connects the risk levels a business WILL NOT accept with the action to be taken to reduce the risks TO ACCEPTABLE LEVELS. When calculating risk (Risk = DAFT Cost consequence x No. failures in the period x chance of failure) we use the DAFT Costs and the historical frequency of failure occurrence in the operation, or if that is not available then in that industry. From the table where consequence meets frequency we get a risk rating, those that are Extreme or High will need to be reduced by using one or more Consequence Reduction or Chance Reduction strategies.

Need DaFT Costs to See Total Business Risk
Duration 0:48

This is an extract from Australian Standard 4360:2004, which is a copy of the equivalent ISO standard used internationally. The diagram shows the logical process to follow in identifying, measuring and managing risk. The methodology is well founded and tested, and if applied delivers control of risk in a situation.

The guide to the standard is very comprehensive in explaining the risk management process and has worked examples of how to apply the various steps.

The important point is that all situations contain risk, but no one knows which situation will go beyond normal levels of risk to become a major incident. This means that every situation must be treated as being possible to progress to disaster. The only protection is to implement a standard method of suitable risk control and ensure it is religiously followed. This includes conducting regular tests that the risk mitigation measures do work and are being followed by all parties.

Equipment Criticality = Operational Risk Rating
Duration 0:36

The concept of Equipment Criticality is used to determine the importance of plant and equipment to the success of an operation. It provides a way to prioritize equipment so that efforts are directed towards the plant and equipment that delivers the most important outcomes for the business. Typically the Equipment Criticality is arrived at by Operations and Maintenance personnel sitting down and working thorough every item of equipment and applying the risk matrix to determine the risk to the enterprise should the equipment fail. The operational risk rating becomes the ‘Equipment Criticality’.

A more rigorous method, and one based on financial justification, is to use the ‘Optimised Operating Profit Method’. By applying DAFT Costs when calculating the risk from equipment failure to the enterprise, it permits each item of plant to be graded in order of true financial impact on the operation should it fail. The ‘Equipment Criticality’ then reflects the financial risk grading.

It is important that every item of plant and equipment be categorised, including every sub-system in each equipment assembly. We need to know how critical is the smallest item so we understand what is important to continued operation. There have been many situations where smaller items of equipment, such as an oil circulating pump or a process sensor, were not identified for criticality and were not maintained. Eventually they failed and the operation was brought down for days while parts were rushed to do a repair. Be sure that you know how important every item of equipment is to your business.

What Risks Are Your Equipment Seeing?
Duration 4:36

The trap many operations fall into is to focus much condition monitoring effort on the critical plant and discount the importance of monitoring the remaining equipment. In reality the key equipment is naturally high in priority and people are well aware of the consequences of failure. This focus tends to help keep reliability and availability high by applying condition monitoring to detect impending failures. As a result it is possible that the rest of the plant will end up suffering more downtime from lack of attention.

It becomes necessary to find methods to also condition monitor all the ‘less important’ items of plant and equipment. One method is to use the human senses of operators and maintainers and supplement them with simple monitoring tools to conduct regular inspections of all equipments’ condition.

Recognising the Size of Your Equipment Risk
Duration 0:40

Equipment Criticality indicates risk to the business. It highlights how bad a situation can become if it is allowed to occur. The true financial impact on a business of a bad risk is only fully appreciated when the Defect and Failure True Costs (DAFT Costs) are completely known.

Remember, if there is no failure there is no costs. Hence, there is good justification to spend money on preventing failure, because, if the failure is not stopped, it eventually will almost certainly occur, and then vast DAFT Costs will be spent.
We need to know where to put our efforts for the greatest payback. The 80/20 rule applies to maintenance as well – which 20% of equipment maintenance gives 80% of the benefits. Once you have order of priority, you know what to focus on.

Equipment Criticality Includes all Risks
Duration 0:36

This slide is an overview of the Equipment Criticality identification process used to identify equipment and business risk. From which the operating, maintenance and inspection activities to run and care for the equipment are derived. During the process of determining the criticality, all the issues that affect equipment operation and safety are considered when making the rating.

The Application of Risk Based Principles to Managing Maintenance
Duration 0:51

The risk management methodology is an ideal fit to the maintenance function. It requires maintenance to apply sound risk identification and risk control principles to plant and equipment. By following a standard procedure to determine the risk, like using the ‘Optimised Operating Profit Method’, the correct and appropriate strategies and practices can be identified and implemented.


Duration 1:35

The ‘Likelihood of Failure’ is a determination from tables such as in the Table, developed using risk analysis methodology from international risk management standards and industry guides, .

Australian Risk Management Standard AS4360:2004.
Robinson, Richard M., et al, ‘Risk and Reliability: An Introductory Text’, R2A Pty Ltd, 7th Edition

Risk Identification and Removal Worksheets
Duration 1:53

These two tables to do risk rating and to select risk mitigation are in the Workbook that accompanies these slides.

Match Equipment Maintenance and Operating Practices to Equipment Criticality
Duration 2:07

The end result of the equipment criticality process is a table showing the Criticality Rating and impacts on the business of failure, the actions necessary to control the risks, along with who is responsible for them to be done.

The method makes it clear to management how the organisation suffers from failure and initiates the introduction of suitable practices to control the risk.

The criticality rating process is applied to plant and equipment in order to determine operating risk and address it with appropriate operating and maintenance strategies. It does not consider how the risks can be prevented in the first place, so that no risk is present to have to control. Such an approach is proactive and I encourage organisations to do it. It is one of the most important steps on the journey to operating excellence.

The SABC criticality-rating chart was also used to determine the critical parts within the machine. The same decision logic was applied to the equipment’s components. From that review process the critical spares were determined and a decision made to either stock them or to monitor their condition and look for deterioration.

Parts that must never fail were changed out in a time-based cycle, parts that wore out unpredictable were monitored and parts that did not matter if they failed were brought in when they broke.

Once the criticality ratings are determined for each machine, and its components, a spreadsheet is developed listing the applicable maintenance strategy and the maintenance tasks to be used on the equipment.

The complete maintenance philosophy, spare parts requirements, condition monitoring and preventative requirements, and the maintenance frequency for every item of plant are all there on one sheet for all to see.

With this spreadsheet done first, it is an easy matter to transfer all of the required inspections and checks into a CMMS and generate preventative and corrective maintenance work orders to care for the equipment.

 

The Industrial and Manufacturing Wellness Book explains IONICS Process 2: Business Risk Rating for Equipment Criticality

 

The new Industrial and Manufacturing Wellness book contains all the latest information, all the latest templates, and worked examples of how to design and build a Plant Wellness Way Enterprise Asset Management (PWWEAM) system-of-reliability. Get the book from its publisher, Industrial Press, and Amazon Books.

The PLANT WELLNESS WAY EAM TRAINING COURSE teaches you to use and master the Plant Wellness Way EAM methodology. Follow this link to read about Training for New Users in the Plant Wellness Way EAM Methodology for World Class Reliability.

You are welcome to go to the Plant Wellness Way Tutorials webpage and look at worked examples of Plant Wellness Way EAM techniques and read in-depth explanations of the latest version of many PWWEAM presentation slides.

Use the head office email address on the Contact Us page if you have questions about the videos.