Mean Time Between Failure (MTBF) And MTTR: A Complete Guide

An operator running CMMS+ software on a machine and calculating MTBF and MTTR.

The metric MTBF is the reliability of a machine. MTTR speaks to logistics required around bringing the asset back up to running status and is a metric that contributes to the criticality and consequence of the asset in an unplanned down state. Machine downtime can result in wasted labor, expensive parts and supplies usage, and loss of production at the least. Simply put, downtime is expensive. In this regard, knowing MTBF and MTTR scores can help us easily manage machine downtime and optimize our processes such that if downtime occurs, the event is as short a time period as possible. Let’s take a closer look at MTBF and MTTR in detail.

Understanding MTBF and MTTR

Before we dive deeper into MTBF and MTTR, let’s understand one important metric—uptime. 

What is Uptime?

Uptime represents the total number of hours machines are active for production. It can be calculated as:  

\ Uptime = \frac{Total \ time - downtime}{Total time}

It’s a measure of machine availability. An uptime of more than 99.99% represents an ideal production environment. 

Now let’s take a look at MTBF and MTTR.

Mean Time to Resolution (MTTR) 

MTTR represents the time required to resolve a breakdown and put machines back in action.

\ MTTR = \frac{Total \ downtime}{Total \ number \ of \ breakdowns}

 Technically, it represents the resilience of a machine to get back to service after a breakdown. 

Mean Time Between Failures (MTBF) 

MTBF is the average time between two consecutive incidences of breakdown. 

\ MTBF = \frac{Total \ downtime - downtime}{Total \ number \ of \ breadowns}

Technically, MTBF represents the reliability of your production units—in short, how often your machines go down within a certain period of time.

Examples of MTBF and MTTR

Envisioning MTBF and MTTR can be easier through a real-world example. 

Consider a plant log table of 30 days with the following details: 

  • 10 incident outages
  • 20 hours of downtime
  • 200 minutes to acknowledge 

Now let’s calculate uptime, MTBF, and MTTR. 

Uptime = (24hr\ in \ a \ day \ \ast \ 30 \ days) - (20 \ hours \ of \ downtime) = 700 \ hrs

\% \ Uptime = \frac{700}{720} \ \ast \ 100 = 97.22\%

As the uptime is not up to the required benchmark, it is definitely imperative to look for other reliability metrics such as MTBF and MTTR. 

MTTR = 20 hrs of downtime/10 incidents of breakdown = 2 hours per breakdown 

Though the figure may not seem worse, analyzing the degree of severity of each breakdown will help better understand the situation. 

\ MTBF = \frac{720 \ hrs - 20 \ hrs}{10} = 70 \ hrs

That is almost two outages every week.

Potential Issues With MTBF 

There are some potential issues with MTBF to consider, which include:

Assuming a Constant Failure Rate 

Part of MTBF value calculation depends on correctly acknowledging the number of failures. However, certain breakdowns just cannot be foreseen, such as power outages due to a storm or short circuits due to flooding. Sometimes, short breakdowns (less than two minutes) may happen because of an operator’s mistake but not due to the machine itself. 

Differing in Operating Time 

It is important to mention the definition of uptime clearly during calculation. Some industries include warm-up and cool-down times in uptime as well. This may dramatically increase the MTBF of machines against an uptime that only counts hours when production output is constant. 

Single vs. Multiple Assets 

Consider MTBF data of two exact systems for two consecutive months as shown below.

Month 1 MTBFMonth 2 MTBF
System 12546
System 23050

Based on the information given above, you may think that system 1 has lower reliability than system 2. 

Now let’s add some more data to the table as follows. 

Month 1Month 2Month 1 + Month 2
MTBF Runtime /
No. of breakdowns
MTBF Runtime /
No. of breakdowns
MTBFRuntime /
No. of breakdowns
System 125150/646690/1540840/21
System 230540/1850300/635840/24

Here, we have broken down the MTBF of each system for each month as the total uptime hours over the total number of failures. In the end column, we have calculated the MTBF of each system for two months together. As you can see, now system 2 has lower reliability than system 1. Thus, careful observation is deemed necessary while analyzing the MTBF of multiple machines at once. 

How to Maximize MTBF With Industry 4.0 Solutions

Using industry 4.0 technology can greatly maximize MTBF. Here we will discuss:

Autonomous Maintenance

The technologies discussed above can help industries automate machine tracking and create maintenance schedules to maximize MTBF. Operators can complete proactive maintenance tasks, checklists, and inspections continually. Tools such as connected work cells (CWC) can help operators run autonomous maintenance. In simple terms, CWC is an IT-based service to digitally collect, analyze, and manage production-level data to increase daily efficiency, reduce errors, and contribute to the ongoing health of the production asset.

Preventive Maintenance (PM)

Creating PM and recurring work schedules have been a staple in increasing industrial machines’ reliability and overall equipment effectiveness (OEE) for a long time. 

Using Industry 4.0 tools like the internet of things (IoT), along with analytics techniques now readily available, industrial plants can gain insight into the frequency of failure modes and root cause factors to review and improve the efficacy of PM tasks and intervals. This will not only help keep machines running effectively to maximize MTBF but also minimize PM costs and inefficient maintenance practices. 

Asset Monitoring & Utilization 

To plan proactive maintenance to improve MTBF, it is imperative to gain real-time visibility into asset performance, status, and overall utilization. Many pre-built software solutions can help collect machine data from different existing industrial automation systems to create alerts for timely troubleshooting. 

Condition-Based Maintenance

This is all about visualizing the conditions of various machines and utilizing those data streams to set intervals for either recurring PM tasks or needed Just in time (JIT) proactive maintenance tasks. This is where the power of the industrial internet of things (I-IoT) comes into the picture. To collect, contextualize and analyze various leading indicators of machine failures, such as temperature, vibration, power draw, etc., with ease.

Predictive Maintenance

As the name suggests, it’s all about accurately predicting the time to do maintenance, based on analyzing sensor data and the frequency of failure of a machine to predict potential future failure events—and creating maintenance actions to preempt those failure events. The model learns over time and gains accuracy based on knowledge. This art of self-learning sophistication is the key difference between predictive and condition-based maintenance. 

Use of Industry 4.0 Tools 

The use of Industry 4.0 tools such as the IoT, augmented reality (AR), artificial intelligence (AI), and machine learning can empower your frontline workers to reduce downtime. For example, operators can use hands-free digital eyewear to record maintenance procedures and publish them onto a cloud-based content editing platform for an easy cross-platform viewing experience. These visual instructions can help new operators perform maintenance on that machine with a step-by-step immersed video, cutting down on the time it takes to fix or maintain the asset.

Slash MTTR by 20% and Improve MTBF With LLumin’s CMMS+ Solution 

The first of its kind to use machine-level data, LLumin’s CMMS+ software has been designed to reduce your plant’s MTTR score and improve MTBF within months of going live. Among its many feature sets, LLumin integrates with all machine data sources within your operation and monitors live conditions and parameters to automatically alert maintenance teams if any of those vitals go in the wrong direction. Moreover, it can flag upcoming or overdue preventive maintenance jobs that need to be performed and create an individual role-based dashboard for each mechanic or technician to save time on preparation. In the event of any critical error, LLumin notifies the concerned tech team and simultaneously notifies any other level of expertise needed, anywhere. If the issue remains unaddressed by the maintenance team, the software then escalates to any designated individuals or groups via text message. It eliminates guesswork and makes intelligent decisions without wasting productive hours. As a result, it improves your MTBF score and slashes the MTTR score by nearly 20% within months of going live.

Getting Started With LLumin

LLumin develops innovative CMMS and Asset Management software to manage and track assets in industrial plants, facilities, municipalities, and universities. If you would like to know more about MTBF and MTTF and how LLumin can help, we encourage you to schedule a free demo or Contact the experts at LLumin.