Chapter 1
12 min read
Section 2 of 104

The RUL Prediction Problem

Introduction to Predictive Maintenance

Learning Objectives

By the end of this section, you will:

  1. Formalize RUL prediction as a supervised regression problem with precise mathematical notation
  2. Understand input representation: multivariate time series from sensors and operational settings
  3. Master the piecewise linear degradation model and why it reflects engineering reality
  4. Identify the fundamental challenges that make RUL prediction difficult for machine learning
  5. Frame RUL as a multi-task problem with both regression and classification objectives
  6. Know the evaluation metrics: RMSE, MAE, and the asymmetric NASA scoring function
Why This Matters: Before building any machine learning model, we must precisely define what we are trying to predict, what data we have, and how we measure success. This section establishes the formal framework that all subsequent chapters build upon.

Formal Problem Formulation

We formulate Remaining Useful Life (RUL) prediction as a supervised regression problem with an auxiliary classification task. Given a multivariate time series of sensor measurements from operating equipment, the goal is to predict how many operational cycles remain before failure.

The Core Prediction Task

Let us define the problem mathematically. Consider a piece of equipment (e.g., a turbofan engine) that operates in discrete cycles. At each cycle tt, we observe:

  • Sensor measurements: temperature, pressure, vibration, speed, etc.
  • Operational settings: altitude, throttle position, Mach number, etc.

Our task is to use the history of these observations to predict how many cycles remain until the equipment fails.

f:X1:Ty^RULf: \mathbf{X}_{1:T} \rightarrow \hat{y}_{\text{RUL}}

Where:

  • ff is the prediction function (neural network) we want to learn
  • X1:T\mathbf{X}_{1:T} is the sequence of observations from cycle 1 to current cycle TT
  • y^RUL\hat{y}_{\text{RUL}} is the predicted remaining useful life (in cycles)

Input Representation

The input to our model is a multivariate time series—a sequence of feature vectors recorded at each operational cycle.

Mathematical Definition

X=[x1,x2,,xT]RT×D\mathbf{X} = [\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_T] \in \mathbb{R}^{T \times D}

Where:

  • TT is the sequence length (number of timesteps/cycles)
  • DD is the feature dimension (number of sensors + operational settings)
  • xtRD\mathbf{x}_t \in \mathbb{R}^D is the feature vector at timestep tt

Feature Composition

In the NASA C-MAPSS benchmark we use throughout this book, each feature vector contains:

CategoryCountExamples
Operational Settings3Altitude, Mach number, Throttle resolver angle
Sensor Measurements14Temperature, Pressure, Speed, Vibration
Total Features17D = 17 dimensional feature vector

Feature Selection

The original C-MAPSS dataset contains 21 sensor measurements, but 7 are constant or near-constant and provide no information about degradation. Following established practice, we use 14 informative sensors plus 3 operational settings, yielding D=17D = 17 features.

Sliding Window Approach

Rather than processing entire engine trajectories (which vary in length), we use a sliding window to create fixed-length sequences:

Xwindow=[xtW+1,xtW+2,,xt]RW×D\mathbf{X}_{\text{window}} = [\mathbf{x}_{t-W+1}, \mathbf{x}_{t-W+2}, \ldots, \mathbf{x}_t] \in \mathbb{R}^{W \times D}

Where:

  • WW is the window size (we use W=30W = 30 cycles)
  • tt is the current timestep
  • The label for each window is the RUL at the final timestep tt

Output Targets

Our model produces two outputs for each input sequence:

Primary Output: RUL Prediction

y^RULR+\hat{y}_{\text{RUL}} \in \mathbb{R}^+

A non-negative real number representing the predicted remaining cycles until failure. This is a regression target.

Auxiliary Output: Health State Classification

y^health{0,1,2}\hat{y}_{\text{health}} \in \{0, 1, 2\}

A discrete category representing the equipment's degradation stage. This is a classification target with three classes:

ClassHealth StateRUL RangeInterpretation
0NormalRUL > 80Equipment operating normally, no action needed
1Early Degradation30 < RUL ≤ 80Degradation detected, schedule maintenance
2CriticalRUL ≤ 30Failure imminent, immediate action required

Why Two Outputs?

The auxiliary health classification task is not just for interpretability—it is central to our AMNL innovation. As we will show in Chapter 10, treating this auxiliary task as equally important as RUL prediction provides crucial regularization that enables state-of-the-art performance.

The Piecewise Linear Degradation Model

A critical preprocessing step is how we define the ground-truth RUL labels. Real equipment does not degrade immediately from the start—there is typically a healthy period where wear is negligible.

The Problem with Linear RUL

Naively, we might define RUL as a simple countdown:

RULnaive(t)=Tfailuret\text{RUL}_{\text{naive}}(t) = T_{\text{failure}} - t

But this creates a problem: early in the equipment's life, sensor readings show no degradation signature. Asking a model to predict RUL=250 vs RUL=300 when both correspond to healthy equipment is impossible—there is no signal in the data to distinguish them.

Piecewise Linear Solution

The standard solution is to cap the RUL at a maximum value RmaxR_{\max}:

RUL(t)=min(Rmax,Tfailuret)\text{RUL}(t) = \min(R_{\max}, T_{\text{failure}} - t)

Or equivalently:

RUL(t)={Rmaxif Tfailuret>RmaxTfailuretotherwise\text{RUL}(t) = \begin{cases} R_{\max} & \text{if } T_{\text{failure}} - t > R_{\max} \\ T_{\text{failure}} - t & \text{otherwise} \end{cases}

In the NASA C-MAPSS benchmark, Rmax=125R_{\max} = 125 cycles is the standard choice.

Physical Interpretation

The capping threshold Rmax=125R_{\max} = 125 corresponds roughly to the point where degradation becomes detectable in sensor readings. Before this point, the equipment is in its "infant mortality" or "useful life" phase where failures are random, not wear-related.

Why RUL Prediction is Hard

RUL prediction is not a simple regression problem. Several fundamental challenges make it difficult for machine learning:

1. Non-Stationarity

The statistical properties of sensor data change over time as equipment degrades. A model trained on healthy data may fail on degraded data, and vice versa.

P(xtRUL=100)P(xtRUL=20)P(\mathbf{x}_t | \text{RUL} = 100) \neq P(\mathbf{x}_t | \text{RUL} = 20)

2. Multi-Modal Degradation

Equipment can fail in different ways. A turbofan engine might experience:

  • High-pressure compressor (HPC) degradation
  • Fan degradation
  • Combustor issues
  • Turbine blade erosion

Each failure mode produces different sensor signatures. A model must learn to recognize all failure modes, not just one.

3. Operating Condition Variability

Sensor readings depend heavily on operating conditions, not just degradation state:

  • Temperature readings at sea level ≠ temperature readings at 35,000 ft
  • Vibration at full throttle ≠ vibration at idle
  • Pressure ratios depend on ambient conditions

The model must learn to disentangle condition effects from degradation effects—a challenging feature engineering problem that deep learning can potentially solve.

4. Label Noise

The ground-truth failure time TfailureT_{\text{failure}} is determined by a threshold crossing in simulation, or by physical inspection in real data. This introduces label uncertainty:

  • When exactly did the degradation start?
  • Is the failure point precisely defined?
  • Could the equipment have operated longer?

5. Imbalanced Data

Most of an engine's operational life is spent in the healthy phase. The critical RUL range (0-30 cycles) represents only a small fraction of training data:

RUL RangeApproximate % of DataImportance
RUL > 80 (Normal)~60%Low (easy to predict)
30 < RUL ≤ 80 (Degradation)~25%Medium
RUL ≤ 30 (Critical)~15%High (crucial for maintenance)

The Accuracy Paradox

A naive model that always predicts RUL = 125 would achieve reasonable RMSE on average, but would be completely useless for the critical predictions that matter most. This is why we need specialized loss functions that emphasize the critical phase.

Multi-Task Learning Formulation

To address these challenges, we formulate RUL prediction as a multi-task learning problem with two objectives:

Task 1: RUL Regression (Primary)

LRUL=1Ni=1Nwi(y^iyi)2\mathcal{L}_{\text{RUL}} = \frac{1}{N} \sum_{i=1}^{N} w_i \cdot (\hat{y}_i - y_i)^2

Where wiw_i is a sample weight that emphasizes critical-phase predictions (more on this in Chapter 11).

Task 2: Health Classification (Auxiliary)

Lhealth=1Ni=1Nc=02yi,clog(p^i,c)\mathcal{L}_{\text{health}} = -\frac{1}{N} \sum_{i=1}^{N} \sum_{c=0}^{2} y_{i,c} \log(\hat{p}_{i,c})

Standard cross-entropy loss for 3-class classification.

Combined AMNL Loss

Our key innovation is combining these with equal weights:

LAMNL=0.5×LRUL+0.5×Lhealth\mathcal{L}_{\text{AMNL}} = 0.5 \times \mathcal{L}_{\text{RUL}} + 0.5 \times \mathcal{L}_{\text{health}}
The Counterintuitive Discovery: Conventional wisdom says to weight the primary task (RUL) higher than the auxiliary task (health classification). But our experiments show that equal weighting provides superior regularization, especially for complex multi-condition scenarios.

Evaluation Metrics

We evaluate RUL prediction using several complementary metrics:

Root Mean Square Error (RMSE)

The primary metric for comparing methods:

RMSE=1Ni=1N(y^iyi)2\text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (\hat{y}_i - y_i)^2}

Lower is better. RMSE penalizes large errors more heavily than small errors due to the squaring operation.

Mean Absolute Error (MAE)

A more robust metric less sensitive to outliers:

MAE=1Ni=1Ny^iyi\text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |\hat{y}_i - y_i|

NASA Asymmetric Scoring Function

The NASA score reflects the real-world cost asymmetry: predicting failure too late (overestimating RUL) is more dangerous than predicting too early (underestimating RUL).

S=1Ni=1Nsi,where si={edi/131if di<0 (early)edi/101if di0 (late)S = \frac{1}{N} \sum_{i=1}^{N} s_i, \quad \text{where } s_i = \begin{cases} e^{-d_i/13} - 1 & \text{if } d_i < 0 \text{ (early)} \\ e^{d_i/10} - 1 & \text{if } d_i \geq 0 \text{ (late)} \end{cases}

Where di=y^iyid_i = \hat{y}_i - y_i is the prediction error.

Coefficient of Determination (R²)

Measures how well predictions explain variance in true RUL:

R2=1i=1N(y^iyi)2i=1N(yiyˉ)2R^2 = 1 - \frac{\sum_{i=1}^{N} (\hat{y}_i - y_i)^2}{\sum_{i=1}^{N} (y_i - \bar{y})^2}

R2=1.0R^2 = 1.0 means perfect prediction; R2=0R^2 = 0 means no better than predicting the mean.


Summary

In this section, we have formally defined the RUL prediction problem:

  1. Input: Multivariate time series XRT×D\mathbf{X} \in \mathbb{R}^{T \times D} with D=17D = 17 features over T=30T = 30 timesteps
  2. Primary output: Continuous RUL prediction y^RULR+\hat{y}_{\text{RUL}} \in \mathbb{R}^+
  3. Auxiliary output: Discrete health state y^health{0,1,2}\hat{y}_{\text{health}} \in \{0, 1, 2\}
  4. Degradation model: Piecewise linear with Rmax=125R_{\max} = 125
  5. Key challenges: Non-stationarity, multi-modal degradation, operating condition variability, label noise, data imbalance
  6. Evaluation: RMSE (primary), MAE, NASA Score (asymmetric), R²
Looking Ahead: In the next section, we will explore why deep learning is particularly well-suited for RUL prediction, and trace the evolution of neural network approaches for time series analysis.

With the problem formally defined, we are ready to understand the solution approach.