Measuring Preventive Maintenance Effectiveness and Preventing Early Life Equipment Failures

It’s a common problem, poor plant and equipment reliability after a maintenance intervention, with many failures happening soon after start-up. It’s not a new concern, they gave it a name in the 1940’s—the Waddington Effect. These days we know what must do done to stop early-life failures.

 


 

Dear Mike,

Currently I am working on a project to develop Preventive Maintenance Effectiveness Audit Process. My deliverables will be as follows:

 

1. audit checklist

2. procedure

3. KPI’s

4. process chart

 

Why we are doing this? Because we noticed that we have too many failures after conducting PM. Also too much PM work is not conducted as per the approved PM procedure.

Please advise if you’ve done such work before for a similar process improvement project?

Thanks, Amos

 


 

Dear Amos,

What you identified in your operation is a common problem known as early life failure, also called infant mortality, where recently rebuilt equipment fails. It was first identified during World War II and called by the investigator’s name—the Waddington Effect.

It was also identified by Nolan and Heap in their 1978 Reliability Centered Maintenance report on aircraft equipment failures as failure curve ‘F’, which was 68% of all aircraft equipment failures. There are a number of contributing human factor causes and work quality control causes, including:

  • use of degraded parts and out-of-specification parts;
  • few or no work quality control standards within the PM;
  • no records kept of ‘as found’ and ‘as returned’ values and conditions;
  • insufficient training in critical trade skills to do the work correctly;
  • poor or inadequate physical access to plant and equipment;
  • inadequate or no job task planning to identify critical tasks and ensure their correct performance, and more.

An audit will not fix the problems. All an audit can do is identify how bad and extensive the problem is. An audit may also help to pinpoint the range of causes of the problem. There will be a number of causes that act together to lead to the current situation. It will not be just one cause and you will need to make changes in a range of business processes and practices.

The image below explains what happens when equipment undergoes Preventive Maintenance (PM). Every time you do a maintenance intrusion you increase the likelihood of equipment failure soon after start-up.

risk from early life failure or infant mortality when doing preventive maintenance (PM) work

Do equipment manufacturers have early life failure problems after building their new equipment? Does early life failure only happen after maintenance is done on equipment? If new equipment built by the original manufacturer has very few early life failures, yet the same item rebuilt in a maintenance overhaul suffers greatly from infant mortality, there must be a serious problem with the maintenance rebuild processes and practices used when maintaining the equipment.

We have known for a long time what must be done to make plant and equipment highly reliable. All the answers were available in 1985; nothing new has been discovered since. We have refined the answers so they work better, faster, and for less cost, but they are the same solutions as was known in 1985.

In the image below are listed what must be done in each phase of equipment life to maximise its chance of a long, failure-free service life. We call such equipment high reliability equipment.

reliability is malleable by choice of policy and quality of practice

Something is badly awry with your preventive maintenance management processes that needs to be identified and corrected. The poor PM work quality and regular early life failures are symptoms of poor and weak preventive maintenance processes and work quality controls.

In order to find and respond to the PM process failures and PM work non-compliance issues I would undertake an improvement project using the phases noted below.

  1. Draft process workflow diagrams of your existing PM processes showing the steps involved and who has responsibilities throughout the PM work processes.
  2. Review PM procedures for content to understand how PM work is intended to be done in the workplace.
  3. Review sample of PM work orders and PM work order history to see how PM work is actually planned, organised and conducted.
  4. Develop an audit tool to identify what is causing the poor outcomes, i.e. early life failure, and non-compliance to procedures.
  5. Discuss with appropriate people involved in PM work what they know about the problems and seek their input to improvements.
  6. Use the Audit tool to find further weaknesses in PM processes and procedures.
  7. From the interviews and audit results identify where the PM processes are failing to prevent early life failure and non-compliance to procedures.
  8. Redesign the PM processes and include changes to remove weaknesses and make the processes highly certain of being successful.
  9. Select useful KPIs to monitor performance of PM processes, compliance to procedures, and effectiveness of maintenance work practices.
  10. Include into all PM procedures the requirements and KPI’s needed to prevent early life failure, and insure full compliance to approved procedures.
  11. Develop appropriate training for staff, workers and contractors involved in PM to teach them how to use the redesigned processes and procedures.
  12. Train staff, workers and contractors to use the improved processes and procedures.
  13. Introduce improved PM processes and procedures into the workplace and start measuring compliance to PM procedures and reduction in early life failures.
  14. Make suitable adjustments to the new approach for PM work where KPI monitoring and measurements indicate it is necessary.

By using the above methodology we will learn why early life failure and non-compliance to PM procedures occur. With that knowledge we will make good and proper improvements in processes, procedures, and training to address the issues causing your problems.

Another option is to apply what the lower image shown above notes—”reliability is malleable by choice of policy”. Change your maintenance policy and you change your reliability.

An example of a change in maintenance policy is not to do maintenance overhauls of equipment but instead to replace the plant item with a new, equivalent item. New equipment has far fewer early life failures than when it is rebuilt and you would gain that benefit for your operation when you install new-for-old equipment.

When ever you can afford to do so buy new and swap out the old and tired item, your operating equipment will be far more reliable.

All the best to you,

Mike Sondalini
Managing Director
Lifetime Reliability Solutions HQ

PS. If you require advice on industrial asset management, industrial equipment maintenance strategy, defect elimination and failure prevention or plant and equipment maintenance and reliability, please feel free to contact me by email at info@lifetime-reliability.com