Chapter 11
18 min read
Section 54 of 104

Asymmetric RUL Loss

Advanced Loss Components

Learning Objectives

By the end of this section, you will:

  1. Understand why RUL errors are asymmetric in their consequences
  2. Analyze the NASA scoring function and its asymmetric penalties
  3. Design asymmetric loss functions for training
  4. Balance asymmetry with training stability
  5. Implement differentiable asymmetric losses in PyTorch
Why This Matters: In real maintenance operations, predicting failure too late is catastrophic (unplanned downtime, safety risks), while predicting too early is merely costly (premature replacement). Asymmetric losses encode this operational reality into the learning objective.

Asymmetric Nature of RUL Errors

RUL prediction errors have fundamentally different consequences depending on their direction.

Error Direction Analysis

Define the prediction error as:

d=y^y=predicted RULtrue RULd = \hat{y} - y = \text{predicted RUL} - \text{true RUL}
Error SignMeaningConsequenceSeverity
d < 0Predicted RUL < True RULEarly prediction (premature action)Cost inefficiency
d = 0Perfect predictionOptimal maintenance timingIdeal
d > 0Predicted RUL > True RULLate prediction (delayed action)Safety risk, failure

Operational Consequences


NASA Scoring Function

NASA introduced an asymmetric scoring function for C-MAPSS evaluation.

Scoring Function Definition

S=i=1Nsi,where si={exp(di/13)1if di<0 (early)exp(di/10)1if di0 (late)S = \sum_{i=1}^{N} s_i, \quad \text{where } s_i = \begin{cases} \exp(-d_i/13) - 1 & \text{if } d_i < 0 \text{ (early)} \\ \exp(d_i/10) - 1 & \text{if } d_i \geq 0 \text{ (late)} \end{cases}

Where:

  • di=y^iyid_i = \hat{y}_i - y_i: Prediction error for sample i
  • a1=13a_1 = 13: Early prediction decay constant
  • a2=10a_2 = 10: Late prediction decay constant

Asymmetry Ratio

The ratio a1/a2=13/10=1.3a_1/a_2 = 13/10 = 1.3 means late predictions are penalized more severely:

Error (d)Early Score (d<0)Late Score (d≥0)Ratio
±50.320.652.0×
±100.541.723.2×
±150.683.485.1×
±200.796.398.1×
±300.9019.0921.2×

Exponential Asymmetry

The exponential form means the asymmetry grows rapidly with error magnitude. A 30-cycle late prediction is penalized 21× more than a 30-cycle early prediction.

Score Visualization

📝text
1NASA Score vs. Prediction Error:
2
3Score
4  20 ─┤                          ╱
5     │                        ╱
6  15 ─┤                      ╱
7     │                    ╱
8  10 ─┤                  ╱
9     │                ╱
10   5 ─┤        ______╱
11     │   ____─
12   0 ─┼──────●──────────────────────
13     │      │
14  -5 ─┴──┬──┼──┬──┬──┬──┬──┬──┬──┬
15        -30 -20 -10  0  10 20 30 40
16             Prediction Error (d = ŷ - y)
17
18Key:
19  d < 0: Early (gradual penalty)
20  d > 0: Late (steep penalty)

Asymmetric Loss Formulation

We design a differentiable asymmetric loss for training.

Smooth Asymmetric MSE

A simple approach uses different coefficients for positive and negative errors:

Lasym=1Ni=1Nαi(yiy^i)2\mathcal{L}_{\text{asym}} = \frac{1}{N}\sum_{i=1}^{N} \alpha_i \cdot (y_i - \hat{y}_i)^2

Where:

αi={αearlyif y^i<yiαlateif y^iyi\alpha_i = \begin{cases} \alpha_{\text{early}} & \text{if } \hat{y}_i < y_i \\ \alpha_{\text{late}} & \text{if } \hat{y}_i \geq y_i \end{cases}

Differentiable NASA-Style Loss

For direct optimization toward the NASA score:

LNASA=1Ni=1N{exp(di/13)1if di<0exp(di/10)1if di0\mathcal{L}_{\text{NASA}} = \frac{1}{N}\sum_{i=1}^{N} \begin{cases} \exp(-d_i/13) - 1 & \text{if } d_i < 0 \\ \exp(d_i/10) - 1 & \text{if } d_i \geq 0 \end{cases}

Training Instability

The exponential form can cause gradient explosion for large errors. In practice, we clip errors or use a hybrid approach that switches to linear penalty beyond a threshold.

Hybrid Asymmetric Loss

Combine MSE base with asymmetric adjustment:

Lhybrid=1Ni(yiy^i)2base MSE+λasym1Ni:di>0di2late penalty\mathcal{L}_{\text{hybrid}} = \underbrace{\frac{1}{N}\sum_i (y_i - \hat{y}_i)^2}_{\text{base MSE}} + \lambda_{\text{asym}} \cdot \underbrace{\frac{1}{N}\sum_{i: d_i > 0} d_i^2}_{\text{late penalty}}

This adds extra penalty only for late predictions while keeping the base MSE for all samples.


Implementation

Complete PyTorch implementation of asymmetric RUL losses.

Asymmetric MSE

🐍python
1class AsymmetricMSELoss(nn.Module):
2    """
3    Asymmetric Mean Squared Error loss.
4
5    Penalizes late predictions (over-estimation of RUL) more severely
6    than early predictions (under-estimation).
7
8    Args:
9        alpha_early: Coefficient for early predictions (d < 0)
10        alpha_late: Coefficient for late predictions (d >= 0)
11    """
12
13    def __init__(
14        self,
15        alpha_early: float = 1.0,
16        alpha_late: float = 1.3
17    ):
18        super().__init__()
19        self.alpha_early = alpha_early
20        self.alpha_late = alpha_late
21
22    def forward(
23        self,
24        pred: torch.Tensor,
25        target: torch.Tensor
26    ) -> torch.Tensor:
27        """
28        Compute asymmetric MSE loss.
29
30        Args:
31            pred: Predicted RUL, shape (batch,)
32            target: True RUL, shape (batch,)
33
34        Returns:
35            Asymmetric MSE loss (scalar)
36        """
37        pred = pred.view(-1)
38        target = target.view(-1)
39
40        # Compute errors: d = pred - target
41        errors = pred - target
42        squared_errors = errors ** 2
43
44        # Asymmetric coefficients
45        # Late: d >= 0 (predicted >= actual, over-estimation)
46        # Early: d < 0 (predicted < actual, under-estimation)
47        coefficients = torch.where(
48            errors >= 0,
49            torch.tensor(self.alpha_late, device=errors.device),
50            torch.tensor(self.alpha_early, device=errors.device)
51        )
52
53        # Weighted loss
54        weighted_errors = coefficients * squared_errors
55        loss = weighted_errors.mean()
56
57        return loss

NASA-Style Exponential Loss

🐍python
1class NASAScoreLoss(nn.Module):
2    """
3    Differentiable approximation of NASA scoring function.
4
5    Uses exponential penalties with different decay constants
6    for early vs. late predictions.
7
8    Args:
9        a1: Decay constant for early predictions (default 13)
10        a2: Decay constant for late predictions (default 10)
11        clip_error: Maximum error magnitude to prevent explosion
12    """
13
14    def __init__(
15        self,
16        a1: float = 13.0,
17        a2: float = 10.0,
18        clip_error: float = 50.0
19    ):
20        super().__init__()
21        self.a1 = a1
22        self.a2 = a2
23        self.clip_error = clip_error
24
25    def forward(
26        self,
27        pred: torch.Tensor,
28        target: torch.Tensor
29    ) -> torch.Tensor:
30        """
31        Compute NASA-style exponential loss.
32
33        Args:
34            pred: Predicted RUL, shape (batch,)
35            target: True RUL, shape (batch,)
36
37        Returns:
38            NASA score loss (scalar)
39        """
40        pred = pred.view(-1)
41        target = target.view(-1)
42
43        # Compute errors with clipping
44        errors = pred - target
45        errors = torch.clamp(errors, -self.clip_error, self.clip_error)
46
47        # Compute scores
48        early_mask = errors < 0
49        late_mask = ~early_mask
50
51        scores = torch.zeros_like(errors)
52        scores[early_mask] = torch.exp(-errors[early_mask] / self.a1) - 1
53        scores[late_mask] = torch.exp(errors[late_mask] / self.a2) - 1
54
55        # Mean score
56        loss = scores.mean()
57
58        return loss

Combined Weighted Asymmetric Loss

🐍python
1class WeightedAsymmetricMSE(nn.Module):
2    """
3    Combines sample weighting (linear decay) with asymmetric penalties.
4
5    This is the recommended loss for RUL prediction when both
6    sample importance and error direction matter.
7
8    Args:
9        r_max: Maximum RUL for weight computation
10        w_min: Minimum sample weight
11        w_max: Maximum sample weight
12        alpha_early: Asymmetry coefficient for early predictions
13        alpha_late: Asymmetry coefficient for late predictions
14    """
15
16    def __init__(
17        self,
18        r_max: float = 125.0,
19        w_min: float = 1.0,
20        w_max: float = 2.0,
21        alpha_early: float = 1.0,
22        alpha_late: float = 1.3
23    ):
24        super().__init__()
25        self.r_max = r_max
26        self.w_min = w_min
27        self.w_max = w_max
28        self.alpha_early = alpha_early
29        self.alpha_late = alpha_late
30
31    def forward(
32        self,
33        pred: torch.Tensor,
34        target: torch.Tensor
35    ) -> torch.Tensor:
36        pred = pred.view(-1)
37        target = target.view(-1)
38
39        # Sample weights (linear decay)
40        capped_target = torch.clamp(target, max=self.r_max)
41        sample_weights = self.w_max - (self.w_max - self.w_min) * capped_target / self.r_max
42
43        # Asymmetric coefficients
44        errors = pred - target
45        asym_coeffs = torch.where(
46            errors >= 0,
47            torch.tensor(self.alpha_late, device=errors.device),
48            torch.tensor(self.alpha_early, device=errors.device)
49        )
50
51        # Combined weighted loss
52        squared_errors = errors ** 2
53        weighted_errors = sample_weights * asym_coeffs * squared_errors
54
55        loss = weighted_errors.sum() / sample_weights.sum()
56
57        return loss

Summary

In this section, we explored asymmetric RUL loss:

  1. Motivation: Late predictions (failure miss) are far more costly than early predictions
  2. NASA score: Exponential asymmetry with 10/13 ratio
  3. Asymmetric MSE: Simple coefficient-based approach
  4. Hybrid loss: MSE base + extra late penalty
  5. Combined: Sample weighting + asymmetric coefficients
Loss Typeα_earlyα_lateUse Case
Symmetric MSE1.01.0Baseline
Mild asymmetry1.01.3General RUL (recommended)
Strong asymmetry1.02.0Safety-critical systems
NASA-styleexp(-d/13)exp(d/10)Match evaluation metric
Looking Ahead: We have addressed RUL-specific losses. The next section introduces focal loss for health classification—a technique for handling the imbalanced distribution of health states in training data.

With asymmetric RUL loss understood, we address class imbalance in health classification.