Learning Objectives
By the end of this section, you will:
- Identify RUL-specific properties that challenge standard MTL methods
- Understand why each method fails for predictive maintenance
- Analyze empirical results across C-MAPSS datasets
- Appreciate the surprising discovery that motivates AMNL
- Prepare for Chapter 10 where we introduce our solution
Why This Matters: Understanding why existing methods fail is essential before introducing a solution. The failures are not random—they stem from specific properties of the RUL prediction task that violate assumptions made by standard multi-task learning methods. This analysis motivates AMNL's design.
RUL-Specific Challenges
RUL prediction has unique characteristics that challenge standard MTL assumptions.
Challenge 1: Extreme Loss Scale Difference
The scale mismatch between RUL (MSE) and health (CE) losses is extreme:
1Typical loss magnitudes during training:
2
3Early training:
4 L_RUL ≈ 2000-5000 (squared error on cycles)
5 L_health ≈ 1-2 (cross-entropy on 3 classes)
6 Ratio: ~2000:1
7
8Late training:
9 L_RUL ≈ 50-200
10 L_health ≈ 0.2-0.5
11 Ratio: ~300:1
12
13The ratio changes 6× during training!Challenge 2: Non-Stationary Loss Dynamics
RUL loss is highly non-stationary:
- Early plateau: Model struggles to learn, loss fluctuates
- Rapid descent: Loss drops quickly once patterns emerge
- Late noise: Loss oscillates due to hard examples
- Overfitting risk: Loss may increase on validation
Challenge 3: Asymmetric Task Difficulty
| Property | RUL Prediction | Health Classification |
|---|---|---|
| Output type | Continuous (cycles) | Discrete (3 classes) |
| Difficulty | Hard (exact prediction) | Easier (coarse grouping) |
| Error tolerance | Low (cycles matter) | Higher (class is enough) |
| Label noise | Moderate (degradation variability) | Low (derived from RUL) |
Challenge 4: Task Correlation
Health labels are derived from RUL, creating strong correlation:
This means errors are not independent—a RUL prediction error near a boundary (e.g., RUL = 51 vs 49) strongly affects health classification.
Why Each Method Fails
Each method we studied has specific failure modes for RUL.
Fixed Weights
| Issue | Consequence |
|---|---|
| Cannot adapt to changing scales | Optimal weights at epoch 1 ≠ epoch 100 |
| Dataset-specific optima | Must retune for FD001, FD002, FD003, FD004 |
| Expensive search | Grid search over 2D weight space per dataset |
| No principled selection | Weights are arbitrary, not data-driven |
Uncertainty Weighting (Kendall et al.)
GradNorm
| Issue | Consequence |
|---|---|
| Noisy training rates | RUL loss fluctuates, making r_i unstable |
| Computational cost | ~3× training time for 2 tasks |
| Last layer only | Misses gradient dynamics in earlier layers |
| α sensitivity | Optimal α differs across datasets |
Dynamic Weight Average
| Issue | Consequence |
|---|---|
| Two-epoch lag | Cannot respond to rapid loss changes |
| Epoch-level smoothing | Misses within-epoch dynamics |
| Temperature sensitivity | T affects convergence |
| Loss ratio instability | Small denominator issues |
Empirical Evidence
Comprehensive experiments reveal the limitations of existing methods.
Results Across C-MAPSS Datasets
| Method | FD001 | FD002 | FD003 | FD004 | Avg |
|---|---|---|---|---|---|
| Fixed (0.5/0.5) | 11.2 | 15.4 | 11.8 | 18.6 | 14.3 |
| Fixed (tuned) | 11.8 | 16.2 | 12.5 | 19.8 | 15.1 |
| Uncertainty | 12.4 | 17.1 | 13.1 | 20.5 | 15.8 |
| GradNorm | 11.5 | 15.8 | 12.1 | 19.2 | 14.7 |
| DWA | 11.6 | 15.9 | 12.3 | 19.3 | 14.8 |
| AMNL (ours) | 10.8 | 13.9 | 11.2 | 17.4 | 13.3 |
Key Observation
Simple fixed 0.5/0.5 weights outperform all adaptive methods on average! This surprising result led us to investigate why "equal weighting" works so well for RUL prediction.
Adaptive Methods Underperform
The data shows a counterintuitive pattern:
- Uncertainty weighting is the worst performer
- GradNorm and DWA show marginal improvement over fixed weights
- No adaptive method beats simple equal weighting
The Surprising Discovery
Our systematic experiments revealed an unexpected pattern.
Weight Grid Search Results
We searched over all combinations of task weights:
1Weight combinations tested:
2 λ_RUL ∈ {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
3 λ_health = 1 - λ_RUL
4
5Results (FD002 RMSE):
6 λ_RUL = 0.1: 17.8
7 λ_RUL = 0.2: 16.5
8 λ_RUL = 0.3: 15.9
9 λ_RUL = 0.4: 15.6
10 λ_RUL = 0.5: 15.4 ← BEST
11 λ_RUL = 0.6: 15.7
12 λ_RUL = 0.7: 16.1
13 λ_RUL = 0.8: 16.8
14 λ_RUL = 0.9: 17.5
15
16The optimal weight is exactly 0.5/0.5!Consistent Across Datasets
This pattern holds across all four C-MAPSS datasets:
| Dataset | Best λ_RUL | Best λ_health |
|---|---|---|
| FD001 | 0.50 | 0.50 |
| FD002 | 0.50 | 0.50 |
| FD003 | 0.50 | 0.50 |
| FD004 | 0.50 | 0.50 |
The Discovery: Equal task weights (0.5/0.5) provide optimal performance for RUL prediction with health classification as an auxiliary task. This is not a coincidence—it reflects the unique relationship between these tasks where health state serves as a regularizer for RUL learning.
Why Does Equal Weighting Work?
Summary
In this section, we analyzed why traditional methods fail for RUL:
- RUL challenges: Extreme scale differences, non-stationary dynamics
- Fixed weights: Cannot adapt, expensive to tune
- Uncertainty weighting: Confuses scale with uncertainty
- GradNorm: High cost, noisy training rates
- DWA: Lag and smoothing issues
- Discovery: Equal weights (0.5/0.5) work best!
| Method | Adapts? | RUL Performance | Why Fails? |
|---|---|---|---|
| Fixed | No | Moderate | Static, dataset-specific |
| Uncertainty | Yes | Poor | Scale/uncertainty confusion |
| GradNorm | Yes | Moderate | Noisy, expensive |
| DWA | Yes | Moderate | Lag, smoothing |
| Equal (0.5/0.5) | No | Good | — |
Chapter Complete: We have surveyed traditional multi-task loss functions and understood their limitations for RUL prediction. The key discovery—that equal weights work best—motivates AMNL. Chapter 10 introduces our novel loss function that combines equal weighting with proper loss normalization, achieving state-of-the-art results across all C-MAPSS datasets.
Armed with this understanding, we now present AMNL: the Adaptive Multi-task Normalized Loss.