AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand RMSE as the primary RUL evaluation metric
Interpret RMSE values in the context of predictive maintenance
Distinguish between RMSE variants (all-cycles vs. last-cycle)
Implement RMSE computation in Python
Understand why RMSE is preferred over other metrics

Why This Matters: RMSE is the standard benchmark metric for RUL prediction. Understanding its properties—sensitivity to outliers, interpretable units, and comparison across studies—is essential for meaningful model evaluation and comparison with published state-of-the-art results.

RMSE Definition

Root Mean Square Error measures the average magnitude of prediction errors in the same units as the target variable.

Mathematical Formulation

For N predictions, RMSE is defined as:

\text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2}

Where:

$y_i$ is the true RUL value for sample i
$\hat{y}_i$ is the predicted RUL value for sample i
$N$ is the total number of predictions

Component Breakdown

Interpretation

Understanding what RMSE values mean in the context of predictive maintenance.

RMSE in Cycles

For turbofan engines in NASA C-MAPSS, one cycle represents one flight. RMSE values translate directly to prediction accuracy:

RMSE	Interpretation	Practical Impact
< 10	Excellent accuracy	Precise maintenance scheduling
10-15	Good accuracy	Reliable early warning
15-20	Moderate accuracy	Useful but with safety margin
20-30	Fair accuracy	Significant uncertainty
> 30	Poor accuracy	Not reliable for scheduling

Why RMSE Over MAE?

Mean Absolute Error (MAE) is an alternative:

\text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|

Metric	Property	Implication
RMSE	Penalizes large errors more	Preferred when large errors are costly
MAE	Equal penalty for all errors	More robust to outliers
RMSE	Standard in RUL literature	Enables comparison with published work
RMSE	≥ MAE always	RMSE = MAE only when all errors are equal

Literature Standard

RMSE is the standard metric in RUL prediction literature. All state-of-the-art comparisons use RMSE (specifically last-cycle RMSE), making it essential for benchmarking against published results.

Relationship Between RMSE and MAE

\text{MAE} \leq \text{RMSE} \leq \sqrt{N} \cdot \text{MAE}

The ratio RMSE/MAE indicates error distribution. If RMSE ≈ MAE, errors are consistent. If RMSE » MAE, there are some large outlier errors.

RMSE Variants for RUL

NASA C-MAPSS evaluation uses two distinct RMSE computations.

All-Cycles RMSE

Computed over every prediction in the test set:

\text{RMSE}_{\text{all}} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2}

Where N includes all cycles from all test engines. This measures overall prediction consistency throughout the degradation process.

Last-Cycle RMSE (Primary Benchmark)

Computed using only the final prediction for each engine:

\text{RMSE}_{\text{last}} = \sqrt{\frac{1}{M} \sum_{j=1}^{M} (y_j^{\text{last}} - \hat{y}_j^{\text{last}})^2}

Where M is the number of test engines, and "last" denotes the final operating cycle before failure.

Variant	What It Measures	N for FD001
All-cycles	Consistency across all predictions	~13,000
Last-cycle	Final prediction accuracy	100 (one per engine)

Primary Benchmark Metric

Last-cycle RMSE is the standard benchmark metric. All published state-of-the-art results report last-cycle RMSE. When comparing with literature, always use last-cycle RMSE.

Implementation

Python implementation of RMSE computation for RUL evaluation.

All-Cycles RMSE

🐍python

1def compute_rmse_all_cycles(
2    predictions: np.ndarray,
3    targets: np.ndarray
4) -> float:
5    """
6    Compute RMSE over all predictions.
7
8    Args:
9        predictions: Predicted RUL values
10        targets: True RUL values
11
12    Returns:
13        RMSE in cycles
14    """
15    # Ensure consistent RUL capping
16    predictions = np.minimum(predictions, 125.0)
17    targets = np.minimum(targets, 125.0)
18
19    # Compute RMSE
20    rmse_all = np.sqrt(np.mean((predictions - targets) ** 2))
21
22    return float(rmse_all)

Last-Cycle RMSE

🐍python

1def compute_rmse_last_cycle(
2    predictions: np.ndarray,
3    targets: np.ndarray,
4    unit_ids: np.ndarray
5) -> float:
6    """
7    Compute RMSE using only the last cycle of each engine.
8
9    This is the primary benchmark metric for C-MAPSS.
10
11    Args:
12        predictions: All predicted RUL values
13        targets: All true RUL values
14        unit_ids: Engine ID for each prediction
15
16    Returns:
17        Last-cycle RMSE in cycles
18    """
19    # Ensure consistent RUL capping
20    predictions = np.minimum(predictions, 125.0)
21    targets = np.minimum(targets, 125.0)
22
23    # Extract last prediction for each engine
24    unique_units = np.unique(unit_ids)
25    last_predictions = []
26    last_targets = []
27
28    for unit_id in unique_units:
29        mask = unit_ids == unit_id
30        unit_preds = predictions[mask]
31        unit_targets = targets[mask]
32
33        if len(unit_preds) > 0:
34            # Take the last cycle for this engine
35            last_predictions.append(unit_preds[-1])
36            last_targets.append(unit_targets[-1])
37
38    last_predictions = np.array(last_predictions)
39    last_targets = np.array(last_targets)
40
41    # Compute RMSE on last cycles only
42    rmse_last = np.sqrt(np.mean((last_predictions - last_targets) ** 2))
43
44    return float(rmse_last)

Complete RMSE Evaluation

🐍python

1def evaluate_rmse_comprehensive(
2    predictions: np.ndarray,
3    targets: np.ndarray,
4    unit_ids: np.ndarray
5) -> dict:
6    """
7    Comprehensive RMSE evaluation.
8
9    Returns both all-cycles and last-cycle RMSE along with
10    supporting metrics.
11    """
12    # Cap RUL values consistently
13    predictions = np.minimum(predictions, 125.0)
14    targets = np.minimum(targets, 125.0)
15
16    # All-cycles RMSE
17    rmse_all = np.sqrt(np.mean((predictions - targets) ** 2))
18
19    # Last-cycle RMSE
20    unique_units = np.unique(unit_ids)
21    last_predictions = []
22    last_targets = []
23
24    for unit_id in unique_units:
25        mask = unit_ids == unit_id
26        unit_preds = predictions[mask]
27        unit_targets = targets[mask]
28        if len(unit_preds) > 0:
29            last_predictions.append(unit_preds[-1])
30            last_targets.append(unit_targets[-1])
31
32    last_predictions = np.array(last_predictions)
33    last_targets = np.array(last_targets)
34    rmse_last = np.sqrt(np.mean((last_predictions - last_targets) ** 2))
35
36    # Additional metrics
37    mae_all = np.mean(np.abs(predictions - targets))
38    mae_last = np.mean(np.abs(last_predictions - last_targets))
39
40    return {
41        'RMSE_all_cycles': float(rmse_all),
42        'RMSE_last_cycle': float(rmse_last),
43        'MAE_all_cycles': float(mae_all),
44        'MAE_last_cycle': float(mae_last),
45        'n_total_predictions': len(predictions),
46        'n_units_evaluated': len(unique_units),
47    }

Summary

In this section, we covered RMSE for RUL prediction:

Definition: Square root of mean squared errors, in cycles
Interpretation: Average prediction error magnitude (large errors weighted more)
All-cycles RMSE: Measures consistency across all predictions
Last-cycle RMSE: Primary benchmark metric for comparison
Standard metric: Used in all published RUL prediction research

Metric	Formula	Use Case
RMSE (all)	√(mean((y - ŷ)²))	Overall consistency
RMSE (last)	√(mean((y_last - ŷ_last)²))	Benchmark comparison
MAE	mean(\|y - ŷ\|)	Robust to outliers

Looking Ahead: While RMSE treats all errors equally, late predictions in RUL (underestimating remaining life) are more dangerous than early predictions. The next section covers the NASA asymmetric scoring function—a metric that penalizes late predictions more heavily.

With RMSE understood, we examine the asymmetric NASA scoring function.