Chapter 9
12 min read
Section 42 of 104

Fixed Weight Multi-Task Loss

Traditional Multi-Task Loss Functions

Learning Objectives

By the end of this section, you will:

  1. Understand the multi-task learning problem as loss combination
  2. Examine the actual baseline implementation from our research
  3. Identify weight selection challenges in practice
  4. Recognize loss scale mismatch as a fundamental issue
  5. Appreciate why fixed weights are suboptimal
Why This Matters: Before understanding AMNL's innovation, we must understand what came before. Fixed weight multi-task learning is the simplest approach and serves as the baseline against which all advanced methods are compared. Its limitations motivated the development of adaptive weighting schemes.

The Multi-Task Learning Problem

In multi-task learning (MTL), we train a single model on multiple related tasks simultaneously. The central question is: how do we combine task losses into a single optimization objective?

The AMNL Setting

Our model has two tasks with different characteristics:

TaskTypeLossScale
RUL PredictionRegressionMSE~100-10000
Health ClassificationClassificationCross-Entropy~0.1-3

The fundamental challenge: these losses have vastly different scales and semantics. How do we meaningfully combine them?

The Naive Approach

The simplest solution is to just add the losses:

Ltotal=LRUL+Lhealth\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{RUL}} + \mathcal{L}_{\text{health}}

This is problematic because MSE loss for RUL (values in hundreds) will completely dominate cross-entropy loss for health (values near 1).


Fixed Weight Formulation

To address scale mismatch, we introduce fixed weights for each task.

Mathematical Formulation

Ltotal=λ1LRUL+λ2Lhealth\mathcal{L}_{\text{total}} = \lambda_1 \mathcal{L}_{\text{RUL}} + \lambda_2 \mathcal{L}_{\text{health}}

Where:

  • λ1\lambda_1: Weight for RUL task
  • λ2\lambda_2: Weight for health task
  • LRUL\mathcal{L}_{\text{RUL}}: RUL regression loss (MSE or weighted MSE)
  • Lhealth\mathcal{L}_{\text{health}}: Health classification loss (cross-entropy)

Common Weight Schemes

Schemeλ₁ (RUL)λ₂ (Health)Rationale
Equal1.01.0Treat tasks equally (ignores scale)
Normalized0.50.5Sum to 1 (still ignores scale)
Scale-balanced0.0011.0Manual scale matching
Tuned0.70.3Grid search optimized (our default)

Research Implementation

Below is the actual implementation from our research codebase. This is the baseline loss function used in ablation studies to compare against AMNL.

Fixed Weight Loss: Baseline Implementation
🐍losses/multitask_losses.py
1Class Definition

FixedWeightLoss inherits from nn.Module for PyTorch integration. This is the baseline against which AMNL is compared.

3Baseline Method

This represents the simplest multi-task approach: manually set fixed weights that remain constant throughout training.

6Default Weights

Default values of 0.7/0.3 prioritize RUL prediction over health classification. These are hyperparameters that require tuning.

EXAMPLE
Different weight ratios tested: 0.9/0.1, 0.7/0.3, 0.5/0.5
7Weight Storage

Weights are stored as instance attributes. Unlike AMNL, these never change during training—a key limitation.

10Forward Method

Receives pre-computed losses from each task. Note the **kwargs for compatibility with other loss functions that need additional parameters.

19Weighted Sum

The core computation: simple weighted addition. This is the fundamental limitation—weights are static regardless of loss magnitudes.

EXAMPLE
If rul_loss=1000, health_loss=1: 0.7×1000 + 0.3×1 = 700.3
21Name Method

Returns a descriptive string for logging and experiment tracking. Used in ablation study result tables.

16 lines without explanation
1class FixedWeightLoss(nn.Module):
2    """
3    Baseline: Traditional fixed-weight multi-task loss
4    Reference: Standard practice in multi-task learning
5    """
6    def __init__(self, rul_weight=0.7, health_weight=0.3):
7        super().__init__()
8        self.rul_weight = rul_weight
9        self.health_weight = health_weight
10
11    def forward(self, rul_loss, health_loss, **kwargs):
12        """
13        Args:
14            rul_loss: RUL regression loss
15            health_loss: Health classification loss
16
17        Returns:
18            Combined loss
19        """
20        return self.rul_weight * rul_loss + self.health_weight * health_loss
21
22    def get_name(self):
23        return f"Fixed({self.rul_weight:.1f}/{self.health_weight:.1f})"

Usage in Ablation Studies

The fixed weight loss is instantiated with different weight combinations in our ablation experiments:

🐍python
1# Ablation study configurations (from run_ablation_studies.py)
2ABLATION_CONFIGS = {
3    'fixed_70_30': {
4        'name': 'Fixed 0.7/0.3',
5        'description': 'RUL-prioritized weighting',
6        'changes': {'rul_weight': 0.7, 'health_weight': 0.3},
7    },
8    'fixed_50_50': {
9        'name': 'Fixed 0.5/0.5',
10        'description': 'Equal weighting without normalization',
11        'changes': {'rul_weight': 0.5, 'health_weight': 0.5},
12    },
13    'fixed_90_10': {
14        'name': 'Fixed 0.9/0.1',
15        'description': 'Strong RUL preference',
16        'changes': {'rul_weight': 0.9, 'health_weight': 0.1},
17    },
18}

Weight Selection Challenges

Choosing appropriate fixed weights is surprisingly difficult.

The Hyperparameter Search Problem

With two tasks, we have two hyperparameters to tune. Common approaches:

  • Grid search: Try combinations like (0.1, 0.9), (0.5, 0.5), (0.9, 0.1), etc.
  • Random search: Sample weights from distributions
  • Bayesian optimization: Guided search based on validation performance

Problems with Fixed Weights

  1. Expensive search: Each weight combination requires full training
  2. Dataset-specific: Optimal weights differ across FD001-FD004
  3. Training dynamics: Optimal weights change during training
  4. No principled selection: Weights are arbitrary, not derived from data

Loss Scale Mismatch

Even with careful weight tuning, fundamental issues remain.

Dynamic Scale Changes

Loss magnitudes change during training:

📝text
1Training progression (from actual runs):
2
3Epoch 1:   L_RUL = 2500,  L_health = 1.8
4Epoch 10:  L_RUL = 800,   L_health = 0.9
5Epoch 50:  L_RUL = 200,   L_health = 0.4
6Epoch 100: L_RUL = 100,   L_health = 0.2
7
8Relative scale (L_RUL / L_health):
9Epoch 1:   1389x
10Epoch 10:  889x
11Epoch 50:  500x
12Epoch 100: 500x

A fixed weight that balances losses at epoch 1 will not balance them at epoch 100. The relative contribution of tasks shifts during training.

Gradient Magnitude Analysis

What matters for optimization is not loss magnitude but gradient magnitude:

Ltotalθ=λ1LRULθ+λ2Lhealthθ\frac{\partial \mathcal{L}_{\text{total}}}{\partial \theta} = \lambda_1 \frac{\partial \mathcal{L}_{\text{RUL}}}{\partial \theta} + \lambda_2 \frac{\partial \mathcal{L}_{\text{health}}}{\partial \theta}

If LRULθLhealthθ\left\|\frac{\partial \mathcal{L}_{\text{RUL}}}{\partial \theta}\right\| \gg \left\|\frac{\partial \mathcal{L}_{\text{health}}}{\partial \theta}\right\|, the RUL task dominates parameter updates regardless of weights.

Gradient Domination

Even if loss values are balanced via weights, gradient magnitudes may still be imbalanced. This can cause one task to dominate learning, leading to poor performance on the other task.

Task Interference

Fixed weights can cause negative transfer when task gradients conflict:

cos(θL1,θL2)<0\cos(\nabla_\theta \mathcal{L}_1, \nabla_\theta \mathcal{L}_2) < 0

When gradients point in opposing directions, fixed weights cannot resolve the conflict—they just scale the competing gradients without addressing the underlying tension.


Summary

In this section, we examined fixed weight multi-task loss:

  1. Basic formulation: L=λ1L1+λ2L2\mathcal{L} = \lambda_1 \mathcal{L}_1 + \lambda_2 \mathcal{L}_2
  2. Implementation: ~25 lines of Python, simple but limited
  3. Scale mismatch: MSE (~100s) vs CE (~1) requires manual balancing
  4. Weight selection: Expensive grid search, no principled method
  5. Gradient issues: Loss balancing ≠ gradient balancing
IssueImpactAMNL Solution
Scale mismatchOne task dominatesDynamic normalization
Manual tuning750+ GPU hoursZero hyperparameters
Static weightsCan't adaptPer-batch adaptation
Gradient conflictsNot addressedEqual contribution
Looking Ahead: Fixed weights are clearly inadequate. The next section introduces Uncertainty Weighting (Kendall et al.)—a principled approach that learns task weights from homoscedastic uncertainty, eliminating manual tuning.

With fixed weight limitations understood, we examine principled adaptive alternatives.