Chapter 15
12 min read
Section 73 of 104

RMSE: Root Mean Square Error

Evaluation Metrics

Learning Objectives

By the end of this section, you will:

  1. Understand RMSE as the primary RUL evaluation metric
  2. Interpret RMSE values in the context of predictive maintenance
  3. Distinguish between RMSE variants (all-cycles vs. last-cycle)
  4. Implement RMSE computation in Python
  5. Understand why RMSE is preferred over other metrics
Why This Matters: RMSE is the standard benchmark metric for RUL prediction. Understanding its properties—sensitivity to outliers, interpretable units, and comparison across studies—is essential for meaningful model evaluation and comparison with published state-of-the-art results.

RMSE Definition

Root Mean Square Error measures the average magnitude of prediction errors in the same units as the target variable.

Mathematical Formulation

For N predictions, RMSE is defined as:

RMSE=1Ni=1N(yiy^i)2\text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2}

Where:

  • yiy_i is the true RUL value for sample i
  • y^i\hat{y}_i is the predicted RUL value for sample i
  • NN is the total number of predictions

Component Breakdown


Interpretation

Understanding what RMSE values mean in the context of predictive maintenance.

RMSE in Cycles

For turbofan engines in NASA C-MAPSS, one cycle represents one flight. RMSE values translate directly to prediction accuracy:

RMSEInterpretationPractical Impact
< 10Excellent accuracyPrecise maintenance scheduling
10-15Good accuracyReliable early warning
15-20Moderate accuracyUseful but with safety margin
20-30Fair accuracySignificant uncertainty
> 30Poor accuracyNot reliable for scheduling

Why RMSE Over MAE?

Mean Absolute Error (MAE) is an alternative:

MAE=1Ni=1Nyiy^i\text{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|
MetricPropertyImplication
RMSEPenalizes large errors morePreferred when large errors are costly
MAEEqual penalty for all errorsMore robust to outliers
RMSEStandard in RUL literatureEnables comparison with published work
RMSE≥ MAE alwaysRMSE = MAE only when all errors are equal

Literature Standard

RMSE is the standard metric in RUL prediction literature. All state-of-the-art comparisons use RMSE (specifically last-cycle RMSE), making it essential for benchmarking against published results.

Relationship Between RMSE and MAE

MAERMSENMAE\text{MAE} \leq \text{RMSE} \leq \sqrt{N} \cdot \text{MAE}

The ratio RMSE/MAE indicates error distribution. If RMSE ≈ MAE, errors are consistent. If RMSE » MAE, there are some large outlier errors.


RMSE Variants for RUL

NASA C-MAPSS evaluation uses two distinct RMSE computations.

All-Cycles RMSE

Computed over every prediction in the test set:

RMSEall=1Ni=1N(yiy^i)2\text{RMSE}_{\text{all}} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2}

Where N includes all cycles from all test engines. This measures overall prediction consistency throughout the degradation process.

Last-Cycle RMSE (Primary Benchmark)

Computed using only the final prediction for each engine:

RMSElast=1Mj=1M(yjlasty^jlast)2\text{RMSE}_{\text{last}} = \sqrt{\frac{1}{M} \sum_{j=1}^{M} (y_j^{\text{last}} - \hat{y}_j^{\text{last}})^2}

Where M is the number of test engines, and "last" denotes the final operating cycle before failure.

VariantWhat It MeasuresN for FD001
All-cyclesConsistency across all predictions~13,000
Last-cycleFinal prediction accuracy100 (one per engine)

Primary Benchmark Metric

Last-cycle RMSE is the standard benchmark metric. All published state-of-the-art results report last-cycle RMSE. When comparing with literature, always use last-cycle RMSE.


Implementation

Python implementation of RMSE computation for RUL evaluation.

All-Cycles RMSE

🐍python
1def compute_rmse_all_cycles(
2    predictions: np.ndarray,
3    targets: np.ndarray
4) -> float:
5    """
6    Compute RMSE over all predictions.
7
8    Args:
9        predictions: Predicted RUL values
10        targets: True RUL values
11
12    Returns:
13        RMSE in cycles
14    """
15    # Ensure consistent RUL capping
16    predictions = np.minimum(predictions, 125.0)
17    targets = np.minimum(targets, 125.0)
18
19    # Compute RMSE
20    rmse_all = np.sqrt(np.mean((predictions - targets) ** 2))
21
22    return float(rmse_all)

Last-Cycle RMSE

🐍python
1def compute_rmse_last_cycle(
2    predictions: np.ndarray,
3    targets: np.ndarray,
4    unit_ids: np.ndarray
5) -> float:
6    """
7    Compute RMSE using only the last cycle of each engine.
8
9    This is the primary benchmark metric for C-MAPSS.
10
11    Args:
12        predictions: All predicted RUL values
13        targets: All true RUL values
14        unit_ids: Engine ID for each prediction
15
16    Returns:
17        Last-cycle RMSE in cycles
18    """
19    # Ensure consistent RUL capping
20    predictions = np.minimum(predictions, 125.0)
21    targets = np.minimum(targets, 125.0)
22
23    # Extract last prediction for each engine
24    unique_units = np.unique(unit_ids)
25    last_predictions = []
26    last_targets = []
27
28    for unit_id in unique_units:
29        mask = unit_ids == unit_id
30        unit_preds = predictions[mask]
31        unit_targets = targets[mask]
32
33        if len(unit_preds) > 0:
34            # Take the last cycle for this engine
35            last_predictions.append(unit_preds[-1])
36            last_targets.append(unit_targets[-1])
37
38    last_predictions = np.array(last_predictions)
39    last_targets = np.array(last_targets)
40
41    # Compute RMSE on last cycles only
42    rmse_last = np.sqrt(np.mean((last_predictions - last_targets) ** 2))
43
44    return float(rmse_last)

Complete RMSE Evaluation

🐍python
1def evaluate_rmse_comprehensive(
2    predictions: np.ndarray,
3    targets: np.ndarray,
4    unit_ids: np.ndarray
5) -> dict:
6    """
7    Comprehensive RMSE evaluation.
8
9    Returns both all-cycles and last-cycle RMSE along with
10    supporting metrics.
11    """
12    # Cap RUL values consistently
13    predictions = np.minimum(predictions, 125.0)
14    targets = np.minimum(targets, 125.0)
15
16    # All-cycles RMSE
17    rmse_all = np.sqrt(np.mean((predictions - targets) ** 2))
18
19    # Last-cycle RMSE
20    unique_units = np.unique(unit_ids)
21    last_predictions = []
22    last_targets = []
23
24    for unit_id in unique_units:
25        mask = unit_ids == unit_id
26        unit_preds = predictions[mask]
27        unit_targets = targets[mask]
28        if len(unit_preds) > 0:
29            last_predictions.append(unit_preds[-1])
30            last_targets.append(unit_targets[-1])
31
32    last_predictions = np.array(last_predictions)
33    last_targets = np.array(last_targets)
34    rmse_last = np.sqrt(np.mean((last_predictions - last_targets) ** 2))
35
36    # Additional metrics
37    mae_all = np.mean(np.abs(predictions - targets))
38    mae_last = np.mean(np.abs(last_predictions - last_targets))
39
40    return {
41        'RMSE_all_cycles': float(rmse_all),
42        'RMSE_last_cycle': float(rmse_last),
43        'MAE_all_cycles': float(mae_all),
44        'MAE_last_cycle': float(mae_last),
45        'n_total_predictions': len(predictions),
46        'n_units_evaluated': len(unique_units),
47    }

Summary

In this section, we covered RMSE for RUL prediction:

  1. Definition: Square root of mean squared errors, in cycles
  2. Interpretation: Average prediction error magnitude (large errors weighted more)
  3. All-cycles RMSE: Measures consistency across all predictions
  4. Last-cycle RMSE: Primary benchmark metric for comparison
  5. Standard metric: Used in all published RUL prediction research
MetricFormulaUse Case
RMSE (all)√(mean((y - ŷ)²))Overall consistency
RMSE (last)√(mean((y_last - ŷ_last)²))Benchmark comparison
MAEmean(|y - ŷ|)Robust to outliers
Looking Ahead: While RMSE treats all errors equally, late predictions in RUL (underestimating remaining life) are more dangerous than early predictions. The next section covers the NASA asymmetric scoring function—a metric that penalizes late predictions more heavily.

With RMSE understood, we examine the asymmetric NASA scoring function.