Learning Objectives
By the end of this section, you will:
- Understand RMSE as the primary RUL evaluation metric
- Interpret RMSE values in the context of predictive maintenance
- Distinguish between RMSE variants (all-cycles vs. last-cycle)
- Implement RMSE computation in Python
- Understand why RMSE is preferred over other metrics
Why This Matters: RMSE is the standard benchmark metric for RUL prediction. Understanding its properties—sensitivity to outliers, interpretable units, and comparison across studies—is essential for meaningful model evaluation and comparison with published state-of-the-art results.
RMSE Definition
Root Mean Square Error measures the average magnitude of prediction errors in the same units as the target variable.
Mathematical Formulation
For N predictions, RMSE is defined as:
Where:
- is the true RUL value for sample i
- is the predicted RUL value for sample i
- is the total number of predictions
Component Breakdown
Interpretation
Understanding what RMSE values mean in the context of predictive maintenance.
RMSE in Cycles
For turbofan engines in NASA C-MAPSS, one cycle represents one flight. RMSE values translate directly to prediction accuracy:
| RMSE | Interpretation | Practical Impact |
|---|---|---|
| < 10 | Excellent accuracy | Precise maintenance scheduling |
| 10-15 | Good accuracy | Reliable early warning |
| 15-20 | Moderate accuracy | Useful but with safety margin |
| 20-30 | Fair accuracy | Significant uncertainty |
| > 30 | Poor accuracy | Not reliable for scheduling |
Why RMSE Over MAE?
Mean Absolute Error (MAE) is an alternative:
| Metric | Property | Implication |
|---|---|---|
| RMSE | Penalizes large errors more | Preferred when large errors are costly |
| MAE | Equal penalty for all errors | More robust to outliers |
| RMSE | Standard in RUL literature | Enables comparison with published work |
| RMSE | ≥ MAE always | RMSE = MAE only when all errors are equal |
Literature Standard
RMSE is the standard metric in RUL prediction literature. All state-of-the-art comparisons use RMSE (specifically last-cycle RMSE), making it essential for benchmarking against published results.
Relationship Between RMSE and MAE
The ratio RMSE/MAE indicates error distribution. If RMSE ≈ MAE, errors are consistent. If RMSE » MAE, there are some large outlier errors.
RMSE Variants for RUL
NASA C-MAPSS evaluation uses two distinct RMSE computations.
All-Cycles RMSE
Computed over every prediction in the test set:
Where N includes all cycles from all test engines. This measures overall prediction consistency throughout the degradation process.
Last-Cycle RMSE (Primary Benchmark)
Computed using only the final prediction for each engine:
Where M is the number of test engines, and "last" denotes the final operating cycle before failure.
| Variant | What It Measures | N for FD001 |
|---|---|---|
| All-cycles | Consistency across all predictions | ~13,000 |
| Last-cycle | Final prediction accuracy | 100 (one per engine) |
Primary Benchmark Metric
Last-cycle RMSE is the standard benchmark metric. All published state-of-the-art results report last-cycle RMSE. When comparing with literature, always use last-cycle RMSE.
Implementation
Python implementation of RMSE computation for RUL evaluation.
All-Cycles RMSE
1def compute_rmse_all_cycles(
2 predictions: np.ndarray,
3 targets: np.ndarray
4) -> float:
5 """
6 Compute RMSE over all predictions.
7
8 Args:
9 predictions: Predicted RUL values
10 targets: True RUL values
11
12 Returns:
13 RMSE in cycles
14 """
15 # Ensure consistent RUL capping
16 predictions = np.minimum(predictions, 125.0)
17 targets = np.minimum(targets, 125.0)
18
19 # Compute RMSE
20 rmse_all = np.sqrt(np.mean((predictions - targets) ** 2))
21
22 return float(rmse_all)Last-Cycle RMSE
1def compute_rmse_last_cycle(
2 predictions: np.ndarray,
3 targets: np.ndarray,
4 unit_ids: np.ndarray
5) -> float:
6 """
7 Compute RMSE using only the last cycle of each engine.
8
9 This is the primary benchmark metric for C-MAPSS.
10
11 Args:
12 predictions: All predicted RUL values
13 targets: All true RUL values
14 unit_ids: Engine ID for each prediction
15
16 Returns:
17 Last-cycle RMSE in cycles
18 """
19 # Ensure consistent RUL capping
20 predictions = np.minimum(predictions, 125.0)
21 targets = np.minimum(targets, 125.0)
22
23 # Extract last prediction for each engine
24 unique_units = np.unique(unit_ids)
25 last_predictions = []
26 last_targets = []
27
28 for unit_id in unique_units:
29 mask = unit_ids == unit_id
30 unit_preds = predictions[mask]
31 unit_targets = targets[mask]
32
33 if len(unit_preds) > 0:
34 # Take the last cycle for this engine
35 last_predictions.append(unit_preds[-1])
36 last_targets.append(unit_targets[-1])
37
38 last_predictions = np.array(last_predictions)
39 last_targets = np.array(last_targets)
40
41 # Compute RMSE on last cycles only
42 rmse_last = np.sqrt(np.mean((last_predictions - last_targets) ** 2))
43
44 return float(rmse_last)Complete RMSE Evaluation
1def evaluate_rmse_comprehensive(
2 predictions: np.ndarray,
3 targets: np.ndarray,
4 unit_ids: np.ndarray
5) -> dict:
6 """
7 Comprehensive RMSE evaluation.
8
9 Returns both all-cycles and last-cycle RMSE along with
10 supporting metrics.
11 """
12 # Cap RUL values consistently
13 predictions = np.minimum(predictions, 125.0)
14 targets = np.minimum(targets, 125.0)
15
16 # All-cycles RMSE
17 rmse_all = np.sqrt(np.mean((predictions - targets) ** 2))
18
19 # Last-cycle RMSE
20 unique_units = np.unique(unit_ids)
21 last_predictions = []
22 last_targets = []
23
24 for unit_id in unique_units:
25 mask = unit_ids == unit_id
26 unit_preds = predictions[mask]
27 unit_targets = targets[mask]
28 if len(unit_preds) > 0:
29 last_predictions.append(unit_preds[-1])
30 last_targets.append(unit_targets[-1])
31
32 last_predictions = np.array(last_predictions)
33 last_targets = np.array(last_targets)
34 rmse_last = np.sqrt(np.mean((last_predictions - last_targets) ** 2))
35
36 # Additional metrics
37 mae_all = np.mean(np.abs(predictions - targets))
38 mae_last = np.mean(np.abs(last_predictions - last_targets))
39
40 return {
41 'RMSE_all_cycles': float(rmse_all),
42 'RMSE_last_cycle': float(rmse_last),
43 'MAE_all_cycles': float(mae_all),
44 'MAE_last_cycle': float(mae_last),
45 'n_total_predictions': len(predictions),
46 'n_units_evaluated': len(unique_units),
47 }Summary
In this section, we covered RMSE for RUL prediction:
- Definition: Square root of mean squared errors, in cycles
- Interpretation: Average prediction error magnitude (large errors weighted more)
- All-cycles RMSE: Measures consistency across all predictions
- Last-cycle RMSE: Primary benchmark metric for comparison
- Standard metric: Used in all published RUL prediction research
| Metric | Formula | Use Case |
|---|---|---|
| RMSE (all) | √(mean((y - ŷ)²)) | Overall consistency |
| RMSE (last) | √(mean((y_last - ŷ_last)²)) | Benchmark comparison |
| MAE | mean(|y - ŷ|) | Robust to outliers |
Looking Ahead: While RMSE treats all errors equally, late predictions in RUL (underestimating remaining life) are more dangerous than early predictions. The next section covers the NASA asymmetric scoring function—a metric that penalizes late predictions more heavily.
With RMSE understood, we examine the asymmetric NASA scoring function.