Learning Objectives
By the end of this section, you will:
- Understand the multi-task learning problem as loss combination
- Examine the actual baseline implementation from our research
- Identify weight selection challenges in practice
- Recognize loss scale mismatch as a fundamental issue
- Appreciate why fixed weights are suboptimal
Why This Matters: Before understanding AMNL's innovation, we must understand what came before. Fixed weight multi-task learning is the simplest approach and serves as the baseline against which all advanced methods are compared. Its limitations motivated the development of adaptive weighting schemes.
The Multi-Task Learning Problem
In multi-task learning (MTL), we train a single model on multiple related tasks simultaneously. The central question is: how do we combine task losses into a single optimization objective?
The AMNL Setting
Our model has two tasks with different characteristics:
| Task | Type | Loss | Scale |
|---|---|---|---|
| RUL Prediction | Regression | MSE | ~100-10000 |
| Health Classification | Classification | Cross-Entropy | ~0.1-3 |
The fundamental challenge: these losses have vastly different scales and semantics. How do we meaningfully combine them?
The Naive Approach
The simplest solution is to just add the losses:
This is problematic because MSE loss for RUL (values in hundreds) will completely dominate cross-entropy loss for health (values near 1).
Fixed Weight Formulation
To address scale mismatch, we introduce fixed weights for each task.
Mathematical Formulation
Where:
- : Weight for RUL task
- : Weight for health task
- : RUL regression loss (MSE or weighted MSE)
- : Health classification loss (cross-entropy)
Common Weight Schemes
| Scheme | λ₁ (RUL) | λ₂ (Health) | Rationale |
|---|---|---|---|
| Equal | 1.0 | 1.0 | Treat tasks equally (ignores scale) |
| Normalized | 0.5 | 0.5 | Sum to 1 (still ignores scale) |
| Scale-balanced | 0.001 | 1.0 | Manual scale matching |
| Tuned | 0.7 | 0.3 | Grid search optimized (our default) |
Research Implementation
Below is the actual implementation from our research codebase. This is the baseline loss function used in ablation studies to compare against AMNL.
Usage in Ablation Studies
The fixed weight loss is instantiated with different weight combinations in our ablation experiments:
1# Ablation study configurations (from run_ablation_studies.py)
2ABLATION_CONFIGS = {
3 'fixed_70_30': {
4 'name': 'Fixed 0.7/0.3',
5 'description': 'RUL-prioritized weighting',
6 'changes': {'rul_weight': 0.7, 'health_weight': 0.3},
7 },
8 'fixed_50_50': {
9 'name': 'Fixed 0.5/0.5',
10 'description': 'Equal weighting without normalization',
11 'changes': {'rul_weight': 0.5, 'health_weight': 0.5},
12 },
13 'fixed_90_10': {
14 'name': 'Fixed 0.9/0.1',
15 'description': 'Strong RUL preference',
16 'changes': {'rul_weight': 0.9, 'health_weight': 0.1},
17 },
18}Weight Selection Challenges
Choosing appropriate fixed weights is surprisingly difficult.
The Hyperparameter Search Problem
With two tasks, we have two hyperparameters to tune. Common approaches:
- Grid search: Try combinations like (0.1, 0.9), (0.5, 0.5), (0.9, 0.1), etc.
- Random search: Sample weights from distributions
- Bayesian optimization: Guided search based on validation performance
Problems with Fixed Weights
- Expensive search: Each weight combination requires full training
- Dataset-specific: Optimal weights differ across FD001-FD004
- Training dynamics: Optimal weights change during training
- No principled selection: Weights are arbitrary, not derived from data
Loss Scale Mismatch
Even with careful weight tuning, fundamental issues remain.
Dynamic Scale Changes
Loss magnitudes change during training:
1Training progression (from actual runs):
2
3Epoch 1: L_RUL = 2500, L_health = 1.8
4Epoch 10: L_RUL = 800, L_health = 0.9
5Epoch 50: L_RUL = 200, L_health = 0.4
6Epoch 100: L_RUL = 100, L_health = 0.2
7
8Relative scale (L_RUL / L_health):
9Epoch 1: 1389x
10Epoch 10: 889x
11Epoch 50: 500x
12Epoch 100: 500xA fixed weight that balances losses at epoch 1 will not balance them at epoch 100. The relative contribution of tasks shifts during training.
Gradient Magnitude Analysis
What matters for optimization is not loss magnitude but gradient magnitude:
If , the RUL task dominates parameter updates regardless of weights.
Gradient Domination
Even if loss values are balanced via weights, gradient magnitudes may still be imbalanced. This can cause one task to dominate learning, leading to poor performance on the other task.
Task Interference
Fixed weights can cause negative transfer when task gradients conflict:
When gradients point in opposing directions, fixed weights cannot resolve the conflict—they just scale the competing gradients without addressing the underlying tension.
Summary
In this section, we examined fixed weight multi-task loss:
- Basic formulation:
- Implementation: ~25 lines of Python, simple but limited
- Scale mismatch: MSE (~100s) vs CE (~1) requires manual balancing
- Weight selection: Expensive grid search, no principled method
- Gradient issues: Loss balancing ≠ gradient balancing
| Issue | Impact | AMNL Solution |
|---|---|---|
| Scale mismatch | One task dominates | Dynamic normalization |
| Manual tuning | 750+ GPU hours | Zero hyperparameters |
| Static weights | Can't adapt | Per-batch adaptation |
| Gradient conflicts | Not addressed | Equal contribution |
Looking Ahead: Fixed weights are clearly inadequate. The next section introduces Uncertainty Weighting (Kendall et al.)—a principled approach that learns task weights from homoscedastic uncertainty, eliminating manual tuning.
With fixed weight limitations understood, we examine principled adaptive alternatives.