AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand why asymmetric scoring is needed for RUL prediction
Master the NASA scoring formula with its exponential penalties
Analyze the penalty structure for early vs. late predictions
Implement the scoring function with proper variants
Interpret NASA scores in the context of published results

Why This Matters: In predictive maintenance, late predictions are dangerous—predicting an engine has 30 cycles left when it actually has 10 could lead to catastrophic failure. The NASA scoring function captures this asymmetry with exponential penalties that more heavily penalize late predictions.

Motivation for Asymmetry

RMSE treats all errors equally, but not all errors have equal consequences.

The Asymmetry of RUL Errors

Error Type	Prediction	Consequence	Safety Impact
Early	ŷ < y (predict failure sooner)	Premature maintenance	Safe but costly
Late	ŷ > y (predict failure later)	Missed maintenance window	Potentially catastrophic

Real-World Impact

Aviation: Late prediction → unscheduled engine shutdown in flight
Manufacturing: Late prediction → production line failure, costly downtime
Power generation: Late prediction → turbine damage, outage

Scoring Formula

The NASA asymmetric scoring function uses exponential penalties with different decay rates for early and late predictions.

Definition

For each prediction, the individual score is:

s_i = \begin{cases} e^{-d_i/13} - 1 & \text{if } d_i < 0 \text{ (early prediction)} \\ e^{d_i/10} - 1 & \text{if } d_i \geq 0 \text{ (late prediction)} \end{cases}

Where $d_i = \hat{y}_i - y_i$ is the prediction error.

The total NASA score is the sum over all predictions:

S = \sum_{i=1}^{N} s_i

Understanding the Error Direction

Condition	Error d	Meaning	Penalty Base
d < 0	ŷ < y	Early prediction	13 (gentler)
d ≥ 0	ŷ ≥ y	Late prediction	10 (harsher)

Key Parameters

The asymmetry is encoded in the exponential bases: 13 for early predictions (gentler penalty) and 10 for late predictions (harsher penalty). These values were chosen by NASA to reflect the relative costs of early vs. late maintenance.

Exponential Growth

The exponential form means penalties grow rapidly with error magnitude:

Error (cycles)	Early Score (d=-)	Late Score (d=+)
5	0.47	0.65
10	1.16	1.72
15	2.17	3.48
20	3.62	6.39
25	5.60	11.18
30	8.22	19.09

A 30-cycle late prediction scores 19.09, while a 30-cycle early prediction scores only 8.22—more than 2× the penalty for the same magnitude error.

Penalty Analysis

Detailed analysis of the penalty structure reveals important properties.

Perfect Prediction

When $d_i = 0$ (perfect prediction):

s_i = e^{0/10} - 1 = e^0 - 1 = 1 - 1 = 0

Perfect predictions contribute zero to the score.

Penalty Ratio Analysis

Score Sensitivity

The NASA score is highly sensitive to outliers due to its exponential nature:

Scenario	Individual Score	Contribution to Total
99 predictions, d=5 (late)	99 × 0.65 = 64.4	Moderate total
1 prediction, d=50 (late)	1 × 147.4 = 147.4	Dominates!

Outlier Sensitivity

A single large error can dominate the NASA score due to exponential growth. This makes the metric volatile—a good model with one bad prediction may score worse than a mediocre model with consistent errors. This is why RMSE (not NASA score) is typically used for model selection.

Implementation

Complete implementation of the NASA scoring function with multiple variants.

Comprehensive Scoring Function

🐍python

1def nasa_scoring_function_comprehensive(
2    y_true: np.ndarray,
3    y_pred: np.ndarray,
4    method: str = 'paper_style',
5    unit_ids: Optional[np.ndarray] = None,
6    normalize: bool = False
7) -> Dict[str, float]:
8    """
9    NASA asymmetric scoring function with multiple variants.
10
11    Args:
12        y_true: True RUL values
13        y_pred: Predicted RUL values
14        method: Scoring method ('paper_style', 'raw_sum', 'both')
15        unit_ids: Unit IDs for per-unit last-cycle scoring
16        normalize: Whether to normalize by number of predictions
17
18    Returns:
19        Dictionary with comprehensive scoring results
20    """
21    # Ensure predictions and targets are capped at 125 consistently
22    y_pred_capped = np.minimum(y_pred, 125.0)
23    y_true_capped = np.minimum(y_true, 125.0)
24
25    # Compute prediction error
26    d = y_pred_capped - y_true_capped
27
28    # Initialize individual scores
29    scores_individual = np.zeros_like(d)
30
31    # Early prediction (d < 0): gentler penalty with base 13
32    early_mask = d < 0
33    scores_individual[early_mask] = np.exp(-d[early_mask] / 13) - 1
34
35    # Late prediction (d >= 0): harsher penalty with base 10
36    late_mask = d >= 0
37    scores_individual[late_mask] = np.exp(d[late_mask] / 10) - 1
38
39    results = {}
40
41    if method in ['raw_sum', 'both']:
42        # Raw aggregate score (sum over all predictions)
43        raw_score = np.sum(scores_individual)
44        if normalize:
45            raw_score = raw_score / len(scores_individual)
46        results['nasa_score_raw'] = float(raw_score)
47
48    if method in ['paper_style', 'both'] and unit_ids is not None:
49        # Per-unit last-cycle scoring (literature comparable)
50        unique_units = np.unique(unit_ids)
51        unit_scores = []
52
53        for unit_id in unique_units:
54            unit_mask = unit_ids == unit_id
55            unit_scores_all = scores_individual[unit_mask]
56            # Take only the last cycle score for this unit
57            if len(unit_scores_all) > 0:
58                unit_scores.append(unit_scores_all[-1])
59
60        paper_score = np.sum(unit_scores)
61        if normalize:
62            paper_score = paper_score / len(unit_scores)
63        results['nasa_score_paper'] = float(paper_score)
64        results['n_units_evaluated'] = len(unique_units)
65
66    # Diagnostic information
67    results['early_predictions'] = int(np.sum(d < 0))
68    results['late_predictions'] = int(np.sum(d >= 0))
69    results['mean_error'] = float(np.mean(d))
70    results['std_error'] = float(np.std(d))
71    results['early_penalty_avg'] = float(np.mean(scores_individual[early_mask])) if np.any(early_mask) else 0.0
72    results['late_penalty_avg'] = float(np.mean(scores_individual[late_mask])) if np.any(late_mask) else 0.0
73
74    return results

Simple Scoring Function

🐍python

1def nasa_scoring_function(
2    y_true: np.ndarray,
3    y_pred: np.ndarray
4) -> float:
5    """
6    Simple NASA scoring function for backward compatibility.
7
8    Computes raw sum over all predictions without per-unit
9    last-cycle extraction.
10
11    Args:
12        y_true: True RUL values
13        y_pred: Predicted RUL values
14
15    Returns:
16        Total NASA score
17    """
18    scores = nasa_scoring_function_comprehensive(
19        y_true, y_pred,
20        method='raw_sum'
21    )
22    return scores.get('nasa_score_raw', 0.0)

Usage Example

🐍python

1# Example evaluation
2predictions = np.array([110, 85, 45, 25, 8])
3targets = np.array([100, 90, 50, 20, 10])
4unit_ids = np.array([1, 1, 1, 1, 1])  # All same engine
5
6# Compute comprehensive scores
7scores = nasa_scoring_function_comprehensive(
8    y_true=targets,
9    y_pred=predictions,
10    method='both',
11    unit_ids=unit_ids
12)
13
14print(f"NASA Score (paper): {scores['nasa_score_paper']:.2f}")
15print(f"NASA Score (raw): {scores['nasa_score_raw']:.2f}")
16print(f"Early predictions: {scores['early_predictions']}")
17print(f"Late predictions: {scores['late_predictions']}")
18print(f"Mean error: {scores['mean_error']:.2f}")
19
20# Output:
21# NASA Score (paper): 0.18  (only last cycle: 8-10=-2)
22# NASA Score (raw): 2.84    (sum over all cycles)
23# Early predictions: 3
24# Late predictions: 2
25# Mean error: -0.40

Paper-Style vs. Raw

The "paper-style" NASA score uses only the last prediction per engine (matching published benchmarks). The "raw" score sums over all predictions. For fair comparison with literature, always use paper-style scoring.

Summary

In this section, we covered the NASA asymmetric scoring function:

Asymmetry rationale: Late predictions are more dangerous than early
Exponential penalties: Base 13 for early, base 10 for late
Formula: $s_i = e^{|d_i|/a} - 1$ where a=13 (early) or 10 (late)
High sensitivity: Single large errors dominate the score
Paper-style scoring: Uses last-cycle predictions only

Metric	Use Case	Sensitivity
RMSE	Model selection	Moderate (quadratic)
NASA Score	Final reporting	High (exponential)
RMSE	Symmetric errors	Equal penalty
NASA Score	Safety-critical	Late penalized more

Looking Ahead: Both RMSE and NASA score can be computed over all cycles or only the last cycle. The next section examines last-cycle vs. all-cycles evaluation— understanding when and why to use each approach.

With the NASA scoring function understood, we explore evaluation protocols.