Learning Objectives
By the end of this section, you will:
- Review all RUL loss components developed in this chapter
- Design a combination strategy for multiple objectives
- Balance competing loss terms with appropriate weights
- Implement a unified RUL loss module
- Understand when to use each component
Why This Matters: We have developed multiple loss components—each addressing a specific aspect of RUL prediction. This section shows how to combine them into a unified, modular loss function that captures sample importance, error asymmetry, and magnitude sensitivity.
Loss Component Overview
Let us review the loss components we have developed.
Component Summary
| Component | Purpose | Key Formula |
|---|---|---|
| Base MSE | Regression baseline | (ŷ - y)² |
| Sample weights | Emphasize low-RUL | w = 2 - y/125 |
| Asymmetric coefficients | Penalize late predictions | α_late = 1.3 |
| Gradient scaling | Smooth large errors | Huber/smooth L1 |
When Each Component Helps
| Scenario | Recommended Components |
|---|---|
| Balanced dataset, symmetric costs | Base MSE only |
| Critical samples important | Base MSE + Sample weights |
| Late predictions costly | Base MSE + Asymmetric |
| Large outliers in data | Huber loss |
| Full predictive maintenance | All components |
Combination Strategy
There are multiple valid ways to combine loss components.
Strategy 1: Multiplicative
Sample weight and asymmetric coefficient multiply the base error.
- Pros: Intuitive, single loss term
- Cons: Combined effect can be too strong
Strategy 2: Additive
Separate loss terms with balancing weights.
- Pros: Fine-grained control, easy to ablate
- Cons: More hyperparameters (λ₁, λ₂, λ₃)
Strategy 3: Primary + Regularization
One main loss with additional regularization terms.
Unified RUL Loss
We define a unified loss that incorporates all components.
Complete Formulation
Where:
- : Sample weight (clamped)
- : Asymmetric coefficient (1.0 early, 1.3 late)
- : Base error function (MSE or Huber)
Base Error Options
| Function | Formula | Property |
|---|---|---|
| MSE | (ŷ - y)² | Standard, sensitive to outliers |
| Huber | δ² × (√(1 + ((ŷ-y)/δ)²) - 1) | Robust to outliers |
| Smooth L1 | |ŷ - y| - 0.5 when |d| > 1 | Linear for large errors |
Recommended Configuration
| Parameter | Value | Rationale |
|---|---|---|
| R_max | 125 | Standard RUL cap |
| w_min | 1.0 | Base weight |
| w_max | 2.0 | 2× emphasis for critical |
| α_early | 1.0 | Baseline |
| α_late | 1.3 | 30% late penalty |
| Base error | MSE | Simple, works well |
Implementation
Complete modular implementation of the combined RUL loss.
Unified RUL Loss Class
🐍python
1class CombinedRULLoss(nn.Module):
2 """
3 Combined RUL loss with sample weighting and asymmetric penalties.
4
5 Unifies:
6 - Linear decay sample weights (emphasize low RUL)
7 - Asymmetric error coefficients (penalize late predictions)
8 - Optional Huber loss for robustness
9
10 Args:
11 r_max: Maximum RUL for weight computation
12 w_min: Minimum sample weight
13 w_max: Maximum sample weight
14 alpha_early: Coefficient for early predictions
15 alpha_late: Coefficient for late predictions
16 use_huber: Use Huber loss instead of MSE
17 huber_delta: Huber loss delta parameter
18 """
19
20 def __init__(
21 self,
22 r_max: float = 125.0,
23 w_min: float = 1.0,
24 w_max: float = 2.0,
25 alpha_early: float = 1.0,
26 alpha_late: float = 1.3,
27 use_huber: bool = False,
28 huber_delta: float = 10.0
29 ):
30 super().__init__()
31 self.r_max = r_max
32 self.w_min = w_min
33 self.w_max = w_max
34 self.alpha_early = alpha_early
35 self.alpha_late = alpha_late
36 self.use_huber = use_huber
37 self.huber_delta = huber_delta
38
39 def compute_sample_weights(self, target: torch.Tensor) -> torch.Tensor:
40 """Compute linear decay sample weights."""
41 capped = torch.clamp(target, min=0, max=self.r_max)
42 weights = self.w_max - (self.w_max - self.w_min) * capped / self.r_max
43 return torch.clamp(weights, min=self.w_min)
44
45 def compute_asymmetric_coeffs(
46 self,
47 pred: torch.Tensor,
48 target: torch.Tensor
49 ) -> torch.Tensor:
50 """Compute asymmetric coefficients based on error direction."""
51 errors = pred - target
52 coeffs = torch.where(
53 errors >= 0,
54 torch.full_like(errors, self.alpha_late),
55 torch.full_like(errors, self.alpha_early)
56 )
57 return coeffs
58
59 def compute_base_error(
60 self,
61 pred: torch.Tensor,
62 target: torch.Tensor
63 ) -> torch.Tensor:
64 """Compute base error (MSE or Huber)."""
65 if self.use_huber:
66 # Huber loss
67 abs_error = torch.abs(pred - target)
68 quadratic = torch.clamp(abs_error, max=self.huber_delta)
69 linear = abs_error - quadratic
70 return 0.5 * quadratic ** 2 + self.huber_delta * linear
71 else:
72 # Standard MSE
73 return (pred - target) ** 2
74
75 def forward(
76 self,
77 pred: torch.Tensor,
78 target: torch.Tensor
79 ) -> torch.Tensor:
80 """
81 Compute combined RUL loss.
82
83 Args:
84 pred: Predicted RUL, shape (batch,)
85 target: True RUL, shape (batch,)
86
87 Returns:
88 Combined loss (scalar)
89 """
90 pred = pred.view(-1)
91 target = target.view(-1)
92
93 # Compute components
94 sample_weights = self.compute_sample_weights(target)
95 asym_coeffs = self.compute_asymmetric_coeffs(pred, target)
96 base_errors = self.compute_base_error(pred, target)
97
98 # Combined weighted errors
99 weighted_errors = sample_weights * asym_coeffs * base_errors
100
101 # Normalize by weight sum
102 loss = weighted_errors.sum() / (sample_weights.sum() + 1e-8)
103
104 return loss
105
106 def forward_with_components(
107 self,
108 pred: torch.Tensor,
109 target: torch.Tensor
110 ) -> Dict[str, torch.Tensor]:
111 """
112 Compute loss with component breakdown for logging.
113
114 Returns dictionary with:
115 - total: Combined loss
116 - base_mse: Unweighted MSE
117 - weighted_mse: Sample-weighted MSE
118 - asym_loss: Asymmetric loss contribution
119 """
120 pred = pred.view(-1)
121 target = target.view(-1)
122
123 # Components
124 sample_weights = self.compute_sample_weights(target)
125 asym_coeffs = self.compute_asymmetric_coeffs(pred, target)
126 base_errors = (pred - target) ** 2
127
128 # Individual losses
129 base_mse = base_errors.mean()
130 weighted_mse = (sample_weights * base_errors).sum() / sample_weights.sum()
131 combined = (sample_weights * asym_coeffs * base_errors).sum() / sample_weights.sum()
132
133 return {
134 "total": combined,
135 "base_mse": base_mse,
136 "weighted_mse": weighted_mse,
137 "sample_weight_mean": sample_weights.mean(),
138 "asym_coeff_mean": asym_coeffs.mean(),
139 }Usage Example
🐍python
1# Initialize loss
2rul_loss = CombinedRULLoss(
3 r_max=125.0,
4 w_min=1.0,
5 w_max=2.0,
6 alpha_early=1.0,
7 alpha_late=1.3,
8 use_huber=False
9)
10
11# Training loop
12for batch in dataloader:
13 pred = model(batch.x)
14 target = batch.rul
15
16 # Simple usage
17 loss = rul_loss(pred, target)
18
19 # Or with component logging
20 losses = rul_loss.forward_with_components(pred, target)
21 logger.log({
22 "loss/total": losses["total"].item(),
23 "loss/base_mse": losses["base_mse"].item(),
24 "loss/weighted_mse": losses["weighted_mse"].item(),
25 })Summary
In this section, we unified the RUL loss components:
- Components: Sample weights, asymmetric coefficients, base error
- Strategy: Multiplicative combination with clamping
- Formula:
- Implementation: Modular class with component logging
- Flexibility: Optional Huber loss for robustness
| Component | Default | Effect |
|---|---|---|
| Sample weights | [1.0, 2.0] | 2× emphasis on critical |
| Asymmetry | [1.0, 1.3] | 30% late penalty |
| Base error | MSE | Standard regression |
| Normalization | Weight sum | Balanced contribution |
Looking Ahead: We have a complete RUL loss. The final section presents EMA-based adaptive scaling—the technique AMNL uses to dynamically normalize loss magnitudes for stable multi-task training.
With the unified RUL loss complete, we examine EMA-based adaptive scaling.