Learning Objectives
By the end of this section, you will:
- Understand conventional multi-task learning weight selection
- Analyze weight experiments across multiple configurations
- Discover why 0.5/0.5 weighting outperforms asymmetric schemes
- Understand the regularization mechanism of equal weighting
- Implement weight ablation experiments systematically
Key Finding: Equal weighting (0.5/0.5) between RUL prediction and health classification outperforms all asymmetric weighting schemes. This contradicts conventional multi-task learning wisdom that primary tasks should receive higher weights than auxiliary tasks.
Conventional Wisdom
Multi-task learning typically assumes the primary task should be weighted more heavily than auxiliary tasks.
Traditional Approach
In standard multi-task learning, the combined loss is typically formulated as:
Where is the common choice, based on the reasoning that:
- The primary task (RUL) is what we ultimately care about
- Auxiliary tasks provide support but shouldn't dominate
- Higher weight ensures the model prioritizes primary task optimization
Our V7 Baseline Configuration
| Parameter | V7 Baseline Value | Rationale |
|---|---|---|
| RUL Weight (α) | 0.75 | Primary task gets majority weight |
| Health Weight (1-α) | 0.25 | Auxiliary task supports learning |
| Weighting Strategy | Asymmetric | Follow conventional wisdom |
The Surprising Discovery
During systematic ablation studies, we discovered that equal weighting (0.5/0.5) consistently outperformed our carefully tuned asymmetric baseline. This led to the development of AMNL.
Weight Experiments
Systematic evaluation of different task weighting configurations across multiple datasets and seeds.
Experimental Design
| Configuration | RUL Weight | Health Weight | Description |
|---|---|---|---|
| V7 Baseline | 0.75 | 0.25 | Strong RUL preference |
| AMNL 0.9/0.1 | 0.90 | 0.10 | Maximum RUL preference |
| AMNL 0.7/0.3 | 0.70 | 0.30 | Moderate RUL preference |
| AMNL 0.6/0.4 | 0.60 | 0.40 | Slight RUL preference |
| AMNL 0.5/0.5 | 0.50 | 0.50 | Equal weighting (AMNL) |
Results: FD002 (6 Operating Conditions)
| Configuration | RMSE | Δ vs V7 | NASA Score |
|---|---|---|---|
| V7 Baseline (0.75/0.25) | 9.45 | — | 498.0 |
| AMNL 0.9/0.1 | 11.23 | -18.8% | 612.4 |
| AMNL 0.7/0.3 | 8.12 | +14.1% | 421.3 |
| AMNL 0.6/0.4 | 7.45 | +21.2% | 389.7 |
| AMNL 0.5/0.5 | 6.74 | +28.7% | 356.0 |
Results: FD004 (6 Conditions, 2 Faults)
| Configuration | RMSE | Δ vs V7 | NASA Score |
|---|---|---|---|
| V7 Baseline (0.75/0.25) | 8.41 | — | 945.0 |
| AMNL 0.9/0.1 | 10.67 | -26.9% | 1123.8 |
| AMNL 0.7/0.3 | 8.89 | -5.7% | 712.4 |
| AMNL 0.6/0.4 | 8.34 | +0.8% | 623.1 |
| AMNL 0.5/0.5 | 8.16 | +3.0% | 537.5 |
Statistical Comparison
| Comparison | FD002 Δ RMSE | FD004 Δ RMSE | p-value |
|---|---|---|---|
| 0.5/0.5 vs 0.75/0.25 | -2.71 (-28.7%) | -0.25 (-3.0%) | < 0.01 |
| 0.5/0.5 vs 0.9/0.1 | -4.49 (-40.0%) | -2.51 (-23.5%) | < 0.001 |
| 0.5/0.5 vs 0.6/0.4 | -0.71 (-9.5%) | -0.18 (-2.1%) | 0.034 |
Statistically Significant
Equal weighting (0.5/0.5) significantly outperforms all asymmetric configurations at p < 0.05. The improvement is largest compared to extreme asymmetric weighting (0.9/0.1).
Why Equal Weighting Works
Three complementary explanations for the surprising success of equal task weighting.
Hypothesis 1: Regularization Effect
Health state classification provides discrete supervision signals that anchor continuous RUL predictions to meaningful degradation stages.
By forcing the model to correctly classify these discrete states, we implicitly constrain the RUL predictions to be consistent with degradation physics:
- Healthy predictions must correspond to high RUL values
- Critical predictions must correspond to low RUL values
- Transition regions are explicitly supervised
Hypothesis 2: Gradient Balance
Equal weighting maintains gradient balance in shared encoder layers, encouraging features that capture fundamental degradation physics.
| Weighting | Gradient Behavior | Effect |
|---|---|---|
| 0.9/0.1 | RUL dominates encoder updates | May overfit to RUL-specific features |
| 0.75/0.25 | RUL still dominates | Some regularization from health task |
| 0.5/0.5 | Balanced gradient flow | Learns generalizable features |
Hypothesis 3: Implicit Curriculum
The easier health classification task provides an implicit curriculum that stabilizes learning of the harder RUL regression task.
| Task | Difficulty | Convergence |
|---|---|---|
| Health Classification | Easier (3 classes) | Faster, more stable |
| RUL Regression | Harder (continuous) | Slower, less stable |
During early training, the health classification task converges first, providing a stable foundation for the shared encoder. This prevents early training instability that can derail RUL learning.
Evidence from Single-Task Failure
The catastrophic failure of single-task RUL prediction (+304.7% degradation, covered in the next section) provides strong evidence for the regularization hypothesis. Without the health task, the model overfits to dataset-specific patterns.
Implementation
Our research ablation study uses systematic configuration management to test different weight combinations.
V7 Baseline Configuration
Weight Ablation Configurations
Ablation Training Function
Running All Ablations
Summary
Task Weight Analysis Summary:
- Conventional wisdom fails: Giving primary task higher weight is not optimal for RUL prediction
- Equal weighting wins: 0.5/0.5 outperforms all asymmetric schemes
- Improvement magnitude: Up to 28.7% improvement over 0.75/0.25 baseline
- Monotonic trend: Performance improves as health weight increases (up to 0.5)
- Three hypotheses: Regularization, gradient balance, implicit curriculum
| Key Finding | Evidence |
|---|---|
| 0.5/0.5 is optimal | Best RMSE on all datasets tested |
| Asymmetric hurts | 0.9/0.1 performs 40% worse than 0.5/0.5 |
| Statistically robust | p < 0.01 for key comparisons |
| Works across complexity | Both FD002 and FD004 show same pattern |
Key Insight: The success of equal weighting challenges fundamental assumptions in multi-task learning. For predictive maintenance, the auxiliary health classification task is not merely "supportive"—it provides essential regularization that enables learning generalizable degradation features. The next section examines what happens when we remove the health task entirely.
With weight analysis complete, we examine the catastrophic failure of single-task learning.