Learning Objectives
By the end of this section, you will:
- Understand homoscedastic uncertainty in multi-task learning
- Derive task weights from maximum likelihood
- Implement uncertainty weighting with learnable parameters
- Recognize the connection between uncertainty and loss scale
- Identify limitations of this approach
Why This Matters: Kendall et al. (2018) introduced a principled approach to multi-task weighting based on task uncertainty. Instead of manually tuning weights, the model learns them by estimating how "noisy" each task is. This was a major advance over fixed weights and influenced many subsequent methods.
Uncertainty-Based Motivation
The key insight is that task weights should relate to task uncertainty.
Types of Uncertainty
| Type | Also Called | Source | Reducible? |
|---|---|---|---|
| Aleatoric | Data uncertainty | Inherent noise in data | No |
| Epistemic | Model uncertainty | Limited training data | Yes |
| Homoscedastic | Task uncertainty | Constant across inputs | No |
| Heteroscedastic | Input-dependent | Varies with input | No |
Homoscedastic Uncertainty
Kendall et al. focus on homoscedastic uncertainty—task-level uncertainty that is constant across all inputs.
- For RUL: The inherent unpredictability of exact failure time
- For Health: The inherent ambiguity at state boundaries
The intuition: if a task has high intrinsic uncertainty, we should weight it less (its predictions are inherently noisy). If uncertainty is low, we should weight it more (its predictions are reliable).
Probabilistic Framing
Instead of directly predicting outputs, we model the likelihood of observations:
Where is the homoscedastic variance—a learnable parameter representing task uncertainty.
Mathematical Derivation
The uncertainty-weighted loss emerges from maximum likelihood estimation.
Single Task (Regression)
For a regression task with Gaussian likelihood:
The negative log-likelihood is:
Multi-Task Likelihood
For two tasks with independent observations:
Taking the negative log:
Practical Formulation
For numerical stability, we parameterize using :
Where:
- : Log-variance (learnable parameter)
- : Precision (inverse variance)
- : Regularization term
Implementation
The uncertainty-weighted loss is straightforward to implement.
PyTorch Implementation
Training Dynamics
During training, the log-variances adjust automatically:
1Training progression:
2
3Epoch 1: s_rul = 0.0, s_health = 0.0
4 weights: (1.0, 1.0)
5
6Epoch 50: s_rul = 5.2, s_health = -0.3
7 weights: (0.005, 1.35)
8
9Interpretation:
10 - RUL has high uncertainty (σ²_rul ≈ 180) → low weight
11 - Health has low uncertainty (σ²_health ≈ 0.74) → high weight
12 - Model learned that health predictions are more reliableNo Manual Tuning
The key advantage: we did not manually set λ₁ = 0.005 and λ₂ = 1.35. These values emerged from optimizing the likelihood, automatically adapting to the loss scales.
Limitations
Despite its principled foundation, uncertainty weighting has limitations.
Assumption Violations
- Gaussian assumption: RUL errors may not be Gaussian (heavy-tailed, asymmetric)
- Homoscedasticity: Uncertainty may actually vary with RUL level (heteroscedastic)
- Independence: Task errors may be correlated (both depend on degradation)
Optimization Issues
| Issue | Description |
|---|---|
| Local minima | Log-variances can get stuck at poor values |
| Initialization sensitivity | Starting values affect convergence |
| Slow adaptation | Log-variances change slowly via gradient descent |
| Regularization trade-off | log(σ) term may be too weak or strong |
RUL-Specific Problems
Empirical Results on C-MAPSS
Our experiments show uncertainty weighting underperforms:
| Method | FD001 RMSE | FD002 RMSE |
|---|---|---|
| Fixed weights (tuned) | 11.8 | 16.2 |
| Uncertainty weighting | 12.4 | 17.1 |
| AMNL (our method) | 10.8 | 13.9 |
Summary
In this section, we examined uncertainty weighting:
- Core idea: Weight tasks by inverse uncertainty
- Derivation: Maximum likelihood with learnable variance
- Formula:
- Advantage: No manual weight tuning
- Limitation: Confuses loss scale with uncertainty
| Aspect | Value |
|---|---|
| Learnable parameters | 2 (log-variances) |
| Tuning required | None (learns from data) |
| Assumption | Gaussian, homoscedastic errors |
| RUL suitability | Limited (scale/uncertainty confusion) |
Looking Ahead: Uncertainty weighting addresses loss scale automatically but has issues with changing loss dynamics. The next section introduces GradNorm—an approach that directly balances gradient magnitudes rather than loss values.
With uncertainty weighting understood, we examine gradient-based balancing methods.