AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand why RUL errors are asymmetric in their consequences
Analyze the NASA scoring function and its asymmetric penalties
Design asymmetric loss functions for training
Balance asymmetry with training stability
Implement differentiable asymmetric losses in PyTorch

Why This Matters: In real maintenance operations, predicting failure too late is catastrophic (unplanned downtime, safety risks), while predicting too early is merely costly (premature replacement). Asymmetric losses encode this operational reality into the learning objective.

Asymmetric Nature of RUL Errors

RUL prediction errors have fundamentally different consequences depending on their direction.

Error Direction Analysis

Define the prediction error as:

d = \hat{y} - y = \text{predicted RUL} - \text{true RUL}

Error Sign	Meaning	Consequence	Severity
d < 0	Predicted RUL < True RUL	Early prediction (premature action)	Cost inefficiency
d = 0	Perfect prediction	Optimal maintenance timing	Ideal
d > 0	Predicted RUL > True RUL	Late prediction (delayed action)	Safety risk, failure

Operational Consequences

NASA Scoring Function

NASA introduced an asymmetric scoring function for C-MAPSS evaluation.

Scoring Function Definition

S = \sum_{i=1}^{N} s_i, \quad \text{where } s_i = \begin{cases} \exp(-d_i/13) - 1 & \text{if } d_i < 0 \text{ (early)} \\ \exp(d_i/10) - 1 & \text{if } d_i \geq 0 \text{ (late)} \end{cases}

Where:

$d_i = \hat{y}_i - y_i$ : Prediction error for sample i
$a_1 = 13$ : Early prediction decay constant
$a_2 = 10$ : Late prediction decay constant

Asymmetry Ratio

The ratio $a_1/a_2 = 13/10 = 1.3$ means late predictions are penalized more severely:

Error (d)	Early Score (d<0)	Late Score (d≥0)	Ratio
±5	0.32	0.65	2.0×
±10	0.54	1.72	3.2×
±15	0.68	3.48	5.1×
±20	0.79	6.39	8.1×
±30	0.90	19.09	21.2×

Exponential Asymmetry

The exponential form means the asymmetry grows rapidly with error magnitude. A 30-cycle late prediction is penalized 21× more than a 30-cycle early prediction.

Score Visualization

📝text

1NASA Score vs. Prediction Error:
2
3Score
4  20 ─┤                          ╱
5     │                        ╱
6  15 ─┤                      ╱
7     │                    ╱
8  10 ─┤                  ╱
9     │                ╱
10   5 ─┤        ______╱
11     │   ____─
12   0 ─┼──────●──────────────────────
13     │      │
14  -5 ─┴──┬──┼──┬──┬──┬──┬──┬──┬──┬
15        -30 -20 -10  0  10 20 30 40
16             Prediction Error (d = ŷ - y)
17
18Key:
19  d < 0: Early (gradual penalty)
20  d > 0: Late (steep penalty)

Asymmetric Loss Formulation

We design a differentiable asymmetric loss for training.

Smooth Asymmetric MSE

A simple approach uses different coefficients for positive and negative errors:

\mathcal{L}_{\text{asym}} = \frac{1}{N}\sum_{i=1}^{N} \alpha_i \cdot (y_i - \hat{y}_i)^2

Where:

\alpha_i = \begin{cases} \alpha_{\text{early}} & \text{if } \hat{y}_i < y_i \\ \alpha_{\text{late}} & \text{if } \hat{y}_i \geq y_i \end{cases}

Differentiable NASA-Style Loss

For direct optimization toward the NASA score:

\mathcal{L}_{\text{NASA}} = \frac{1}{N}\sum_{i=1}^{N} \begin{cases} \exp(-d_i/13) - 1 & \text{if } d_i < 0 \\ \exp(d_i/10) - 1 & \text{if } d_i \geq 0 \end{cases}

Training Instability

The exponential form can cause gradient explosion for large errors. In practice, we clip errors or use a hybrid approach that switches to linear penalty beyond a threshold.

Hybrid Asymmetric Loss

Combine MSE base with asymmetric adjustment:

\mathcal{L}_{\text{hybrid}} = \underbrace{\frac{1}{N}\sum_i (y_i - \hat{y}_i)^2}_{\text{base MSE}} + \lambda_{\text{asym}} \cdot \underbrace{\frac{1}{N}\sum_{i: d_i > 0} d_i^2}_{\text{late penalty}}

This adds extra penalty only for late predictions while keeping the base MSE for all samples.

Implementation

Complete PyTorch implementation of asymmetric RUL losses.

Asymmetric MSE

🐍python

1class AsymmetricMSELoss(nn.Module):
2    """
3    Asymmetric Mean Squared Error loss.
4
5    Penalizes late predictions (over-estimation of RUL) more severely
6    than early predictions (under-estimation).
7
8    Args:
9        alpha_early: Coefficient for early predictions (d < 0)
10        alpha_late: Coefficient for late predictions (d >= 0)
11    """
12
13    def __init__(
14        self,
15        alpha_early: float = 1.0,
16        alpha_late: float = 1.3
17    ):
18        super().__init__()
19        self.alpha_early = alpha_early
20        self.alpha_late = alpha_late
21
22    def forward(
23        self,
24        pred: torch.Tensor,
25        target: torch.Tensor
26    ) -> torch.Tensor:
27        """
28        Compute asymmetric MSE loss.
29
30        Args:
31            pred: Predicted RUL, shape (batch,)
32            target: True RUL, shape (batch,)
33
34        Returns:
35            Asymmetric MSE loss (scalar)
36        """
37        pred = pred.view(-1)
38        target = target.view(-1)
39
40        # Compute errors: d = pred - target
41        errors = pred - target
42        squared_errors = errors ** 2
43
44        # Asymmetric coefficients
45        # Late: d >= 0 (predicted >= actual, over-estimation)
46        # Early: d < 0 (predicted < actual, under-estimation)
47        coefficients = torch.where(
48            errors >= 0,
49            torch.tensor(self.alpha_late, device=errors.device),
50            torch.tensor(self.alpha_early, device=errors.device)
51        )
52
53        # Weighted loss
54        weighted_errors = coefficients * squared_errors
55        loss = weighted_errors.mean()
56
57        return loss

NASA-Style Exponential Loss

🐍python

1class NASAScoreLoss(nn.Module):
2    """
3    Differentiable approximation of NASA scoring function.
4
5    Uses exponential penalties with different decay constants
6    for early vs. late predictions.
7
8    Args:
9        a1: Decay constant for early predictions (default 13)
10        a2: Decay constant for late predictions (default 10)
11        clip_error: Maximum error magnitude to prevent explosion
12    """
13
14    def __init__(
15        self,
16        a1: float = 13.0,
17        a2: float = 10.0,
18        clip_error: float = 50.0
19    ):
20        super().__init__()
21        self.a1 = a1
22        self.a2 = a2
23        self.clip_error = clip_error
24
25    def forward(
26        self,
27        pred: torch.Tensor,
28        target: torch.Tensor
29    ) -> torch.Tensor:
30        """
31        Compute NASA-style exponential loss.
32
33        Args:
34            pred: Predicted RUL, shape (batch,)
35            target: True RUL, shape (batch,)
36
37        Returns:
38            NASA score loss (scalar)
39        """
40        pred = pred.view(-1)
41        target = target.view(-1)
42
43        # Compute errors with clipping
44        errors = pred - target
45        errors = torch.clamp(errors, -self.clip_error, self.clip_error)
46
47        # Compute scores
48        early_mask = errors < 0
49        late_mask = ~early_mask
50
51        scores = torch.zeros_like(errors)
52        scores[early_mask] = torch.exp(-errors[early_mask] / self.a1) - 1
53        scores[late_mask] = torch.exp(errors[late_mask] / self.a2) - 1
54
55        # Mean score
56        loss = scores.mean()
57
58        return loss

Combined Weighted Asymmetric Loss

🐍python

1class WeightedAsymmetricMSE(nn.Module):
2    """
3    Combines sample weighting (linear decay) with asymmetric penalties.
4
5    This is the recommended loss for RUL prediction when both
6    sample importance and error direction matter.
7
8    Args:
9        r_max: Maximum RUL for weight computation
10        w_min: Minimum sample weight
11        w_max: Maximum sample weight
12        alpha_early: Asymmetry coefficient for early predictions
13        alpha_late: Asymmetry coefficient for late predictions
14    """
15
16    def __init__(
17        self,
18        r_max: float = 125.0,
19        w_min: float = 1.0,
20        w_max: float = 2.0,
21        alpha_early: float = 1.0,
22        alpha_late: float = 1.3
23    ):
24        super().__init__()
25        self.r_max = r_max
26        self.w_min = w_min
27        self.w_max = w_max
28        self.alpha_early = alpha_early
29        self.alpha_late = alpha_late
30
31    def forward(
32        self,
33        pred: torch.Tensor,
34        target: torch.Tensor
35    ) -> torch.Tensor:
36        pred = pred.view(-1)
37        target = target.view(-1)
38
39        # Sample weights (linear decay)
40        capped_target = torch.clamp(target, max=self.r_max)
41        sample_weights = self.w_max - (self.w_max - self.w_min) * capped_target / self.r_max
42
43        # Asymmetric coefficients
44        errors = pred - target
45        asym_coeffs = torch.where(
46            errors >= 0,
47            torch.tensor(self.alpha_late, device=errors.device),
48            torch.tensor(self.alpha_early, device=errors.device)
49        )
50
51        # Combined weighted loss
52        squared_errors = errors ** 2
53        weighted_errors = sample_weights * asym_coeffs * squared_errors
54
55        loss = weighted_errors.sum() / sample_weights.sum()
56
57        return loss

Summary

In this section, we explored asymmetric RUL loss:

Motivation: Late predictions (failure miss) are far more costly than early predictions
NASA score: Exponential asymmetry with 10/13 ratio
Asymmetric MSE: Simple coefficient-based approach
Hybrid loss: MSE base + extra late penalty
Combined: Sample weighting + asymmetric coefficients

Loss Type	α_early	α_late	Use Case
Symmetric MSE	1.0	1.0	Baseline
Mild asymmetry	1.0	1.3	General RUL (recommended)
Strong asymmetry	1.0	2.0	Safety-critical systems
NASA-style	exp(-d/13)	exp(d/10)	Match evaluation metric

Looking Ahead: We have addressed RUL-specific losses. The next section introduces focal loss for health classification—a technique for handling the imbalanced distribution of health states in training data.

With asymmetric RUL loss understood, we address class imbalance in health classification.