Learning Objectives
By the end of this section, you will:
- Understand the core discovery that equal task weights are optimal
- Examine the experimental evidence across all C-MAPSS datasets
- Explain why equal weighting works for RUL prediction
- Recognize the importance of loss normalization
- Appreciate how this discovery led to AMNL
Why This Matters: The discovery that equal task weights (0.5/0.5) consistently outperform adaptive methods was unexpected and counterintuitive. It challenged the prevailing wisdom that sophisticated weight adaptation is always better. This discovery is the foundation of AMNL and the key to achieving state-of-the-art performance.
The Discovery
Our research began with a comprehensive evaluation of multi-task weighting methods. What we found was surprising.
The Hypothesis
We initially hypothesized that adaptive methods (uncertainty weighting, GradNorm, DWA) would outperform fixed weights because:
- They adapt to changing loss scales during training
- They balance gradient magnitudes automatically
- They are "principled" (derived from optimization theory)
The Surprise
Extensive experiments revealed the opposite:
1Hypothesis: Adaptive > Fixed
2
3Reality:
4 Fixed (0.5/0.5) > Adaptive methods
5
6Specifically:
7 Equal weights (0.5/0.5) consistently achieved the best results
8 across ALL four C-MAPSS datasets.The Core Discovery
Equal task weights (0.5 for RUL, 0.5 for health) provide optimal performance when combined with proper loss normalization.
This simple approach outperforms all sophisticated adaptive weighting methods, while being simpler, faster, and more robust.
Experimental Evidence
We conducted systematic weight sweep experiments across all datasets.
Weight Sweep Results
Testing all weight combinations :
| Îģ_RUL | FD001 | FD002 | FD003 | FD004 | Average |
|---|---|---|---|---|---|
| 0.1 | 13.2 | 17.8 | 13.9 | 21.5 | 16.6 |
| 0.2 | 12.4 | 16.5 | 13.1 | 20.3 | 15.6 |
| 0.3 | 11.8 | 15.9 | 12.4 | 19.4 | 14.9 |
| 0.4 | 11.3 | 15.6 | 11.9 | 18.8 | 14.4 |
| 0.5 | 10.8 | 13.9 | 11.2 | 17.4 | 13.3 |
| 0.6 | 11.1 | 15.4 | 11.7 | 18.2 | 14.1 |
| 0.7 | 11.6 | 15.8 | 12.2 | 18.9 | 14.6 |
| 0.8 | 12.2 | 16.4 | 12.8 | 19.6 | 15.3 |
| 0.9 | 12.9 | 17.2 | 13.5 | 20.8 | 16.1 |
Optimal Weight Location
1Performance vs. Îģ_RUL:
2
3RMSE
4 17 ââ ââ
5 âⲠâą
6 16 â⤠Ⲡâą
7 â Ⲡâą
8 15 â⤠Ⲡâą
9 â Ⲡâą
10 14 â⤠Ⲡâą
11 â Ⲡâą
12 13 â⤠Ⲡâąâ˛ âą
13 â â˛__âą â˛____âą
14 12 â⤠âŧ
15 â Îģ = 0.5 (optimal)
16 11 ââ´âââŦâââŦâââŦâââŦâââŦâââŦâââŦâââŦââ
17 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
18 Îģ_RULStatistical Significance
We verified the results are statistically significant:
| Comparison | Î RMSE | p-value | Significant? |
|---|---|---|---|
| 0.5 vs 0.4 | -1.1 | 0.023 | Yes (p < 0.05) |
| 0.5 vs 0.6 | -0.8 | 0.041 | Yes (p < 0.05) |
| 0.5 vs Uncertainty | -2.5 | 0.002 | Yes (p < 0.01) |
| 0.5 vs GradNorm | -1.4 | 0.018 | Yes (p < 0.05) |
Robust Finding
The optimality of Îģ = 0.5 is not due to chance. Statistical tests confirm the result is significant across multiple random seeds and dataset splits.
Why Equal Weights Work
Several factors explain why equal weighting is optimal for RUL prediction.
Factor 1: Maximum Regularization
Factor 2: Gradient Balance
With proper loss normalization, equal weights produce balanced gradients:
Where denotes normalized losses. Neither task dominates gradient updates.
Factor 3: Complementary Information
The two tasks provide complementary supervision:
| Task | Information Type | Benefit |
|---|---|---|
| RUL | Fine-grained (exact cycles) | Precise predictions |
| Health | Coarse-grained (3 states) | Robust features |
Equal weighting ensures both types of information contribute equally to learning.
The Key: Loss Normalization
Equal weighting only works with proper loss normalization.
The Problem Without Normalization
Raw losses have vastly different scales:
1Without normalization:
2 L_RUL â 100-2000 (MSE on cycles)
3 L_health â 0.5-2 (cross-entropy)
4
5With Îģ = 0.5 for both:
6 0.5 Ã 1000 + 0.5 Ã 1.5 = 500.75
7
8RUL contribution: 500 / 500.75 = 99.85%
9Health contribution: 0.15%
10
11â Equal weights â equal contribution!The Solution: Normalize First
AMNL normalizes losses before weighting:
Where EMA is exponential moving average for stable normalization.
1With normalization:
2 LĖ_RUL = 1000 / 1000 = 1.0
3 LĖ_health = 1.5 / 1.5 = 1.0
4
5With Îģ = 0.5 for both:
6 0.5 Ã 1.0 + 0.5 Ã 1.0 = 1.0
7
8RUL contribution: 50%
9Health contribution: 50%
10
11â Equal weights = equal contribution!AMNL = Normalization + Equal Weights
The key innovation of AMNL is not the equal weighting itself, but the combination of proper loss normalization with equal weights. This ensures both tasks contribute equally regardless of their raw loss scales.
Summary
In this section, we presented the core discovery behind AMNL:
- Discovery: Equal weights (0.5/0.5) are optimal for RUL + health
- Evidence: Consistent across all C-MAPSS datasets
- Why it works: Maximum regularization, gradient balance
- Critical requirement: Loss normalization
- AMNL formula: Normalized losses + equal weights
| Aspect | Value |
|---|---|
| Optimal Îģ_RUL | 0.5 |
| Optimal Îģ_health | 0.5 |
| Statistical significance | p < 0.05 |
| Key enabler | Loss normalization |
| Result | State-of-the-art on all datasets |
Looking Ahead: We have established that equal weighting works when combined with normalization. The next section presents the complete mathematical formulation of AMNL, including the normalization mechanism and the full loss equation.
With the discovery explained, we now formalize AMNL mathematically.