AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Review all RUL loss components developed in this chapter
Design a combination strategy for multiple objectives
Balance competing loss terms with appropriate weights
Implement a unified RUL loss module
Understand when to use each component

Why This Matters: We have developed multiple loss components—each addressing a specific aspect of RUL prediction. This section shows how to combine them into a unified, modular loss function that captures sample importance, error asymmetry, and magnitude sensitivity.

Loss Component Overview

Let us review the loss components we have developed.

Component Summary

Component	Purpose	Key Formula
Base MSE	Regression baseline	(ŷ - y)²
Sample weights	Emphasize low-RUL	w = 2 - y/125
Asymmetric coefficients	Penalize late predictions	α_late = 1.3
Gradient scaling	Smooth large errors	Huber/smooth L1

When Each Component Helps

Scenario	Recommended Components
Balanced dataset, symmetric costs	Base MSE only
Critical samples important	Base MSE + Sample weights
Late predictions costly	Base MSE + Asymmetric
Large outliers in data	Huber loss
Full predictive maintenance	All components

Combination Strategy

There are multiple valid ways to combine loss components.

Strategy 1: Multiplicative

\mathcal{L} = \frac{1}{N} \sum_{i} w_i \cdot \alpha_i \cdot (y_i - \hat{y}_i)^2

Sample weight and asymmetric coefficient multiply the base error.

Pros: Intuitive, single loss term
Cons: Combined effect can be too strong

Strategy 2: Additive

\mathcal{L} = \lambda_1 \mathcal{L}_{\text{MSE}} + \lambda_2 \mathcal{L}_{\text{weighted}} + \lambda_3 \mathcal{L}_{\text{asym}}

Separate loss terms with balancing weights.

Pros: Fine-grained control, easy to ablate
Cons: More hyperparameters (λ₁, λ₂, λ₃)

Strategy 3: Primary + Regularization

\mathcal{L} = \mathcal{L}_{\text{primary}} + \lambda \mathcal{L}_{\text{regularizer}}

One main loss with additional regularization terms.

Unified RUL Loss

We define a unified loss that incorporates all components.

Complete Formulation

\mathcal{L}_{\text{RUL}} = \frac{1}{\sum_i w_i} \sum_{i=1}^{N} w_i \cdot \alpha_i \cdot \ell(y_i, \hat{y}_i)

Where:

$w_i = \max(w_{\min}, 2 - y_i/R_{\max})$ : Sample weight (clamped)
$\alpha_i$ : Asymmetric coefficient (1.0 early, 1.3 late)
$\ell(y, \hat{y})$ : Base error function (MSE or Huber)

Base Error Options

Function	Formula	Property
MSE	(ŷ - y)²	Standard, sensitive to outliers
Huber	δ² × (√(1 + ((ŷ-y)/δ)²) - 1)	Robust to outliers
Smooth L1	\|ŷ - y\| - 0.5 when \|d\| > 1	Linear for large errors

Recommended Configuration

Parameter	Value	Rationale
R_max	125	Standard RUL cap
w_min	1.0	Base weight
w_max	2.0	2× emphasis for critical
α_early	1.0	Baseline
α_late	1.3	30% late penalty
Base error	MSE	Simple, works well

Implementation

Complete modular implementation of the combined RUL loss.

Unified RUL Loss Class

🐍python

1class CombinedRULLoss(nn.Module):
2    """
3    Combined RUL loss with sample weighting and asymmetric penalties.
4
5    Unifies:
6    - Linear decay sample weights (emphasize low RUL)
7    - Asymmetric error coefficients (penalize late predictions)
8    - Optional Huber loss for robustness
9
10    Args:
11        r_max: Maximum RUL for weight computation
12        w_min: Minimum sample weight
13        w_max: Maximum sample weight
14        alpha_early: Coefficient for early predictions
15        alpha_late: Coefficient for late predictions
16        use_huber: Use Huber loss instead of MSE
17        huber_delta: Huber loss delta parameter
18    """
19
20    def __init__(
21        self,
22        r_max: float = 125.0,
23        w_min: float = 1.0,
24        w_max: float = 2.0,
25        alpha_early: float = 1.0,
26        alpha_late: float = 1.3,
27        use_huber: bool = False,
28        huber_delta: float = 10.0
29    ):
30        super().__init__()
31        self.r_max = r_max
32        self.w_min = w_min
33        self.w_max = w_max
34        self.alpha_early = alpha_early
35        self.alpha_late = alpha_late
36        self.use_huber = use_huber
37        self.huber_delta = huber_delta
38
39    def compute_sample_weights(self, target: torch.Tensor) -> torch.Tensor:
40        """Compute linear decay sample weights."""
41        capped = torch.clamp(target, min=0, max=self.r_max)
42        weights = self.w_max - (self.w_max - self.w_min) * capped / self.r_max
43        return torch.clamp(weights, min=self.w_min)
44
45    def compute_asymmetric_coeffs(
46        self,
47        pred: torch.Tensor,
48        target: torch.Tensor
49    ) -> torch.Tensor:
50        """Compute asymmetric coefficients based on error direction."""
51        errors = pred - target
52        coeffs = torch.where(
53            errors >= 0,
54            torch.full_like(errors, self.alpha_late),
55            torch.full_like(errors, self.alpha_early)
56        )
57        return coeffs
58
59    def compute_base_error(
60        self,
61        pred: torch.Tensor,
62        target: torch.Tensor
63    ) -> torch.Tensor:
64        """Compute base error (MSE or Huber)."""
65        if self.use_huber:
66            # Huber loss
67            abs_error = torch.abs(pred - target)
68            quadratic = torch.clamp(abs_error, max=self.huber_delta)
69            linear = abs_error - quadratic
70            return 0.5 * quadratic ** 2 + self.huber_delta * linear
71        else:
72            # Standard MSE
73            return (pred - target) ** 2
74
75    def forward(
76        self,
77        pred: torch.Tensor,
78        target: torch.Tensor
79    ) -> torch.Tensor:
80        """
81        Compute combined RUL loss.
82
83        Args:
84            pred: Predicted RUL, shape (batch,)
85            target: True RUL, shape (batch,)
86
87        Returns:
88            Combined loss (scalar)
89        """
90        pred = pred.view(-1)
91        target = target.view(-1)
92
93        # Compute components
94        sample_weights = self.compute_sample_weights(target)
95        asym_coeffs = self.compute_asymmetric_coeffs(pred, target)
96        base_errors = self.compute_base_error(pred, target)
97
98        # Combined weighted errors
99        weighted_errors = sample_weights * asym_coeffs * base_errors
100
101        # Normalize by weight sum
102        loss = weighted_errors.sum() / (sample_weights.sum() + 1e-8)
103
104        return loss
105
106    def forward_with_components(
107        self,
108        pred: torch.Tensor,
109        target: torch.Tensor
110    ) -> Dict[str, torch.Tensor]:
111        """
112        Compute loss with component breakdown for logging.
113
114        Returns dictionary with:
115        - total: Combined loss
116        - base_mse: Unweighted MSE
117        - weighted_mse: Sample-weighted MSE
118        - asym_loss: Asymmetric loss contribution
119        """
120        pred = pred.view(-1)
121        target = target.view(-1)
122
123        # Components
124        sample_weights = self.compute_sample_weights(target)
125        asym_coeffs = self.compute_asymmetric_coeffs(pred, target)
126        base_errors = (pred - target) ** 2
127
128        # Individual losses
129        base_mse = base_errors.mean()
130        weighted_mse = (sample_weights * base_errors).sum() / sample_weights.sum()
131        combined = (sample_weights * asym_coeffs * base_errors).sum() / sample_weights.sum()
132
133        return {
134            "total": combined,
135            "base_mse": base_mse,
136            "weighted_mse": weighted_mse,
137            "sample_weight_mean": sample_weights.mean(),
138            "asym_coeff_mean": asym_coeffs.mean(),
139        }

Usage Example

🐍python

1# Initialize loss
2rul_loss = CombinedRULLoss(
3    r_max=125.0,
4    w_min=1.0,
5    w_max=2.0,
6    alpha_early=1.0,
7    alpha_late=1.3,
8    use_huber=False
9)
10
11# Training loop
12for batch in dataloader:
13    pred = model(batch.x)
14    target = batch.rul
15
16    # Simple usage
17    loss = rul_loss(pred, target)
18
19    # Or with component logging
20    losses = rul_loss.forward_with_components(pred, target)
21    logger.log({
22        "loss/total": losses["total"].item(),
23        "loss/base_mse": losses["base_mse"].item(),
24        "loss/weighted_mse": losses["weighted_mse"].item(),
25    })

Summary

In this section, we unified the RUL loss components:

Components: Sample weights, asymmetric coefficients, base error
Strategy: Multiplicative combination with clamping
Formula: $\mathcal{L} = \sum w_i \cdot \alpha_i \cdot \ell_i / \sum w_i$
Implementation: Modular class with component logging
Flexibility: Optional Huber loss for robustness

Component	Default	Effect
Sample weights	[1.0, 2.0]	2× emphasis on critical
Asymmetry	[1.0, 1.3]	30% late penalty
Base error	MSE	Standard regression
Normalization	Weight sum	Balanced contribution

Looking Ahead: We have a complete RUL loss. The final section presents EMA-based adaptive scaling—the technique AMNL uses to dynamically normalize loss magnitudes for stable multi-task training.

With the unified RUL loss complete, we examine EMA-based adaptive scaling.