Chapter 9
15 min read
Section 46 of 104

Why These Methods Fail for RUL

Traditional Multi-Task Loss Functions

Learning Objectives

By the end of this section, you will:

  1. Identify RUL-specific properties that challenge standard MTL methods
  2. Understand why each method fails for predictive maintenance
  3. Analyze empirical results across C-MAPSS datasets
  4. Appreciate the surprising discovery that motivates AMNL
  5. Prepare for Chapter 10 where we introduce our solution
Why This Matters: Understanding why existing methods fail is essential before introducing a solution. The failures are not random—they stem from specific properties of the RUL prediction task that violate assumptions made by standard multi-task learning methods. This analysis motivates AMNL's design.

RUL-Specific Challenges

RUL prediction has unique characteristics that challenge standard MTL assumptions.

Challenge 1: Extreme Loss Scale Difference

The scale mismatch between RUL (MSE) and health (CE) losses is extreme:

📝text
1Typical loss magnitudes during training:
2
3Early training:
4  L_RUL ≈ 2000-5000  (squared error on cycles)
5  L_health ≈ 1-2     (cross-entropy on 3 classes)
6  Ratio: ~2000:1
7
8Late training:
9  L_RUL ≈ 50-200
10  L_health ≈ 0.2-0.5
11  Ratio: ~300:1
12
13The ratio changes 6× during training!

Challenge 2: Non-Stationary Loss Dynamics

RUL loss is highly non-stationary:

  • Early plateau: Model struggles to learn, loss fluctuates
  • Rapid descent: Loss drops quickly once patterns emerge
  • Late noise: Loss oscillates due to hard examples
  • Overfitting risk: Loss may increase on validation

Challenge 3: Asymmetric Task Difficulty

PropertyRUL PredictionHealth Classification
Output typeContinuous (cycles)Discrete (3 classes)
DifficultyHard (exact prediction)Easier (coarse grouping)
Error toleranceLow (cycles matter)Higher (class is enough)
Label noiseModerate (degradation variability)Low (derived from RUL)

Challenge 4: Task Correlation

Health labels are derived from RUL, creating strong correlation:

health(x)={0if RUL(x)>1251if 50<RUL(x)1252if RUL(x)50\text{health}(x) = \begin{cases} 0 & \text{if RUL}(x) > 125 \\ 1 & \text{if } 50 < \text{RUL}(x) \leq 125 \\ 2 & \text{if RUL}(x) \leq 50 \end{cases}

This means errors are not independent—a RUL prediction error near a boundary (e.g., RUL = 51 vs 49) strongly affects health classification.


Why Each Method Fails

Each method we studied has specific failure modes for RUL.

Fixed Weights

IssueConsequence
Cannot adapt to changing scalesOptimal weights at epoch 1 ≠ epoch 100
Dataset-specific optimaMust retune for FD001, FD002, FD003, FD004
Expensive searchGrid search over 2D weight space per dataset
No principled selectionWeights are arbitrary, not data-driven

Uncertainty Weighting (Kendall et al.)

GradNorm

IssueConsequence
Noisy training ratesRUL loss fluctuates, making r_i unstable
Computational cost~3× training time for 2 tasks
Last layer onlyMisses gradient dynamics in earlier layers
α sensitivityOptimal α differs across datasets

Dynamic Weight Average

IssueConsequence
Two-epoch lagCannot respond to rapid loss changes
Epoch-level smoothingMisses within-epoch dynamics
Temperature sensitivityT affects convergence
Loss ratio instabilitySmall denominator issues

Empirical Evidence

Comprehensive experiments reveal the limitations of existing methods.

Results Across C-MAPSS Datasets

MethodFD001FD002FD003FD004Avg
Fixed (0.5/0.5)11.215.411.818.614.3
Fixed (tuned)11.816.212.519.815.1
Uncertainty12.417.113.120.515.8
GradNorm11.515.812.119.214.7
DWA11.615.912.319.314.8
AMNL (ours)10.813.911.217.413.3

Key Observation

Simple fixed 0.5/0.5 weights outperform all adaptive methods on average! This surprising result led us to investigate why "equal weighting" works so well for RUL prediction.

Adaptive Methods Underperform

The data shows a counterintuitive pattern:

  • Uncertainty weighting is the worst performer
  • GradNorm and DWA show marginal improvement over fixed weights
  • No adaptive method beats simple equal weighting

The Surprising Discovery

Our systematic experiments revealed an unexpected pattern.

Weight Grid Search Results

We searched over all combinations of task weights:

📝text
1Weight combinations tested:
2  λ_RUL ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
3  λ_health = 1 - λ_RUL
4
5Results (FD002 RMSE):
6  λ_RUL = 0.1: 17.8
7  λ_RUL = 0.2: 16.5
8  λ_RUL = 0.3: 15.9
9  λ_RUL = 0.4: 15.6
10  λ_RUL = 0.5: 15.4  ← BEST
11  λ_RUL = 0.6: 15.7
12  λ_RUL = 0.7: 16.1
13  λ_RUL = 0.8: 16.8
14  λ_RUL = 0.9: 17.5
15
16The optimal weight is exactly 0.5/0.5!

Consistent Across Datasets

This pattern holds across all four C-MAPSS datasets:

DatasetBest λ_RULBest λ_health
FD0010.500.50
FD0020.500.50
FD0030.500.50
FD0040.500.50
The Discovery: Equal task weights (0.5/0.5) provide optimal performance for RUL prediction with health classification as an auxiliary task. This is not a coincidence—it reflects the unique relationship between these tasks where health state serves as a regularizer for RUL learning.

Why Does Equal Weighting Work?


Summary

In this section, we analyzed why traditional methods fail for RUL:

  1. RUL challenges: Extreme scale differences, non-stationary dynamics
  2. Fixed weights: Cannot adapt, expensive to tune
  3. Uncertainty weighting: Confuses scale with uncertainty
  4. GradNorm: High cost, noisy training rates
  5. DWA: Lag and smoothing issues
  6. Discovery: Equal weights (0.5/0.5) work best!
MethodAdapts?RUL PerformanceWhy Fails?
FixedNoModerateStatic, dataset-specific
UncertaintyYesPoorScale/uncertainty confusion
GradNormYesModerateNoisy, expensive
DWAYesModerateLag, smoothing
Equal (0.5/0.5)NoGood
Chapter Complete: We have surveyed traditional multi-task loss functions and understood their limitations for RUL prediction. The key discovery—that equal weights work best—motivates AMNL. Chapter 10 introduces our novel loss function that combines equal weighting with proper loss normalization, achieving state-of-the-art results across all C-MAPSS datasets.

Armed with this understanding, we now present AMNL: the Adaptive Multi-task Normalized Loss.