Chapter 11
12 min read
Section 56 of 104

Combined RUL Loss

Advanced Loss Components

Learning Objectives

By the end of this section, you will:

  1. Review all RUL loss components developed in this chapter
  2. Design a combination strategy for multiple objectives
  3. Balance competing loss terms with appropriate weights
  4. Implement a unified RUL loss module
  5. Understand when to use each component
Why This Matters: We have developed multiple loss components—each addressing a specific aspect of RUL prediction. This section shows how to combine them into a unified, modular loss function that captures sample importance, error asymmetry, and magnitude sensitivity.

Loss Component Overview

Let us review the loss components we have developed.

Component Summary

ComponentPurposeKey Formula
Base MSERegression baseline(ŷ - y)²
Sample weightsEmphasize low-RULw = 2 - y/125
Asymmetric coefficientsPenalize late predictionsα_late = 1.3
Gradient scalingSmooth large errorsHuber/smooth L1

When Each Component Helps

ScenarioRecommended Components
Balanced dataset, symmetric costsBase MSE only
Critical samples importantBase MSE + Sample weights
Late predictions costlyBase MSE + Asymmetric
Large outliers in dataHuber loss
Full predictive maintenanceAll components

Combination Strategy

There are multiple valid ways to combine loss components.

Strategy 1: Multiplicative

L=1Niwiαi(yiy^i)2\mathcal{L} = \frac{1}{N} \sum_{i} w_i \cdot \alpha_i \cdot (y_i - \hat{y}_i)^2

Sample weight and asymmetric coefficient multiply the base error.

  • Pros: Intuitive, single loss term
  • Cons: Combined effect can be too strong

Strategy 2: Additive

L=λ1LMSE+λ2Lweighted+λ3Lasym\mathcal{L} = \lambda_1 \mathcal{L}_{\text{MSE}} + \lambda_2 \mathcal{L}_{\text{weighted}} + \lambda_3 \mathcal{L}_{\text{asym}}

Separate loss terms with balancing weights.

  • Pros: Fine-grained control, easy to ablate
  • Cons: More hyperparameters (λ₁, λ₂, λ₃)

Strategy 3: Primary + Regularization

L=Lprimary+λLregularizer\mathcal{L} = \mathcal{L}_{\text{primary}} + \lambda \mathcal{L}_{\text{regularizer}}

One main loss with additional regularization terms.


Unified RUL Loss

We define a unified loss that incorporates all components.

Complete Formulation

LRUL=1iwii=1Nwiαi(yi,y^i)\mathcal{L}_{\text{RUL}} = \frac{1}{\sum_i w_i} \sum_{i=1}^{N} w_i \cdot \alpha_i \cdot \ell(y_i, \hat{y}_i)

Where:

  • wi=max(wmin,2yi/Rmax)w_i = \max(w_{\min}, 2 - y_i/R_{\max}): Sample weight (clamped)
  • αi\alpha_i: Asymmetric coefficient (1.0 early, 1.3 late)
  • (y,y^)\ell(y, \hat{y}): Base error function (MSE or Huber)

Base Error Options

FunctionFormulaProperty
MSE(ŷ - y)²Standard, sensitive to outliers
Huberδ² × (√(1 + ((ŷ-y)/δ)²) - 1)Robust to outliers
Smooth L1|ŷ - y| - 0.5 when |d| > 1Linear for large errors
ParameterValueRationale
R_max125Standard RUL cap
w_min1.0Base weight
w_max2.02× emphasis for critical
α_early1.0Baseline
α_late1.330% late penalty
Base errorMSESimple, works well

Implementation

Complete modular implementation of the combined RUL loss.

Unified RUL Loss Class

🐍python
1class CombinedRULLoss(nn.Module):
2    """
3    Combined RUL loss with sample weighting and asymmetric penalties.
4
5    Unifies:
6    - Linear decay sample weights (emphasize low RUL)
7    - Asymmetric error coefficients (penalize late predictions)
8    - Optional Huber loss for robustness
9
10    Args:
11        r_max: Maximum RUL for weight computation
12        w_min: Minimum sample weight
13        w_max: Maximum sample weight
14        alpha_early: Coefficient for early predictions
15        alpha_late: Coefficient for late predictions
16        use_huber: Use Huber loss instead of MSE
17        huber_delta: Huber loss delta parameter
18    """
19
20    def __init__(
21        self,
22        r_max: float = 125.0,
23        w_min: float = 1.0,
24        w_max: float = 2.0,
25        alpha_early: float = 1.0,
26        alpha_late: float = 1.3,
27        use_huber: bool = False,
28        huber_delta: float = 10.0
29    ):
30        super().__init__()
31        self.r_max = r_max
32        self.w_min = w_min
33        self.w_max = w_max
34        self.alpha_early = alpha_early
35        self.alpha_late = alpha_late
36        self.use_huber = use_huber
37        self.huber_delta = huber_delta
38
39    def compute_sample_weights(self, target: torch.Tensor) -> torch.Tensor:
40        """Compute linear decay sample weights."""
41        capped = torch.clamp(target, min=0, max=self.r_max)
42        weights = self.w_max - (self.w_max - self.w_min) * capped / self.r_max
43        return torch.clamp(weights, min=self.w_min)
44
45    def compute_asymmetric_coeffs(
46        self,
47        pred: torch.Tensor,
48        target: torch.Tensor
49    ) -> torch.Tensor:
50        """Compute asymmetric coefficients based on error direction."""
51        errors = pred - target
52        coeffs = torch.where(
53            errors >= 0,
54            torch.full_like(errors, self.alpha_late),
55            torch.full_like(errors, self.alpha_early)
56        )
57        return coeffs
58
59    def compute_base_error(
60        self,
61        pred: torch.Tensor,
62        target: torch.Tensor
63    ) -> torch.Tensor:
64        """Compute base error (MSE or Huber)."""
65        if self.use_huber:
66            # Huber loss
67            abs_error = torch.abs(pred - target)
68            quadratic = torch.clamp(abs_error, max=self.huber_delta)
69            linear = abs_error - quadratic
70            return 0.5 * quadratic ** 2 + self.huber_delta * linear
71        else:
72            # Standard MSE
73            return (pred - target) ** 2
74
75    def forward(
76        self,
77        pred: torch.Tensor,
78        target: torch.Tensor
79    ) -> torch.Tensor:
80        """
81        Compute combined RUL loss.
82
83        Args:
84            pred: Predicted RUL, shape (batch,)
85            target: True RUL, shape (batch,)
86
87        Returns:
88            Combined loss (scalar)
89        """
90        pred = pred.view(-1)
91        target = target.view(-1)
92
93        # Compute components
94        sample_weights = self.compute_sample_weights(target)
95        asym_coeffs = self.compute_asymmetric_coeffs(pred, target)
96        base_errors = self.compute_base_error(pred, target)
97
98        # Combined weighted errors
99        weighted_errors = sample_weights * asym_coeffs * base_errors
100
101        # Normalize by weight sum
102        loss = weighted_errors.sum() / (sample_weights.sum() + 1e-8)
103
104        return loss
105
106    def forward_with_components(
107        self,
108        pred: torch.Tensor,
109        target: torch.Tensor
110    ) -> Dict[str, torch.Tensor]:
111        """
112        Compute loss with component breakdown for logging.
113
114        Returns dictionary with:
115        - total: Combined loss
116        - base_mse: Unweighted MSE
117        - weighted_mse: Sample-weighted MSE
118        - asym_loss: Asymmetric loss contribution
119        """
120        pred = pred.view(-1)
121        target = target.view(-1)
122
123        # Components
124        sample_weights = self.compute_sample_weights(target)
125        asym_coeffs = self.compute_asymmetric_coeffs(pred, target)
126        base_errors = (pred - target) ** 2
127
128        # Individual losses
129        base_mse = base_errors.mean()
130        weighted_mse = (sample_weights * base_errors).sum() / sample_weights.sum()
131        combined = (sample_weights * asym_coeffs * base_errors).sum() / sample_weights.sum()
132
133        return {
134            "total": combined,
135            "base_mse": base_mse,
136            "weighted_mse": weighted_mse,
137            "sample_weight_mean": sample_weights.mean(),
138            "asym_coeff_mean": asym_coeffs.mean(),
139        }

Usage Example

🐍python
1# Initialize loss
2rul_loss = CombinedRULLoss(
3    r_max=125.0,
4    w_min=1.0,
5    w_max=2.0,
6    alpha_early=1.0,
7    alpha_late=1.3,
8    use_huber=False
9)
10
11# Training loop
12for batch in dataloader:
13    pred = model(batch.x)
14    target = batch.rul
15
16    # Simple usage
17    loss = rul_loss(pred, target)
18
19    # Or with component logging
20    losses = rul_loss.forward_with_components(pred, target)
21    logger.log({
22        "loss/total": losses["total"].item(),
23        "loss/base_mse": losses["base_mse"].item(),
24        "loss/weighted_mse": losses["weighted_mse"].item(),
25    })

Summary

In this section, we unified the RUL loss components:

  1. Components: Sample weights, asymmetric coefficients, base error
  2. Strategy: Multiplicative combination with clamping
  3. Formula: L=wiαii/wi\mathcal{L} = \sum w_i \cdot \alpha_i \cdot \ell_i / \sum w_i
  4. Implementation: Modular class with component logging
  5. Flexibility: Optional Huber loss for robustness
ComponentDefaultEffect
Sample weights[1.0, 2.0]2× emphasis on critical
Asymmetry[1.0, 1.3]30% late penalty
Base errorMSEStandard regression
NormalizationWeight sumBalanced contribution
Looking Ahead: We have a complete RUL loss. The final section presents EMA-based adaptive scaling—the technique AMNL uses to dynamically normalize loss magnitudes for stable multi-task training.

With the unified RUL loss complete, we examine EMA-based adaptive scaling.