Learning Objectives
By the end of this section, you will:
- Understand why asymmetric scoring is needed for RUL prediction
- Master the NASA scoring formula with its exponential penalties
- Analyze the penalty structure for early vs. late predictions
- Implement the scoring function with proper variants
- Interpret NASA scores in the context of published results
Why This Matters: In predictive maintenance, late predictions are dangerous—predicting an engine has 30 cycles left when it actually has 10 could lead to catastrophic failure. The NASA scoring function captures this asymmetry with exponential penalties that more heavily penalize late predictions.
Motivation for Asymmetry
RMSE treats all errors equally, but not all errors have equal consequences.
The Asymmetry of RUL Errors
| Error Type | Prediction | Consequence | Safety Impact |
|---|---|---|---|
| Early | ŷ < y (predict failure sooner) | Premature maintenance | Safe but costly |
| Late | ŷ > y (predict failure later) | Missed maintenance window | Potentially catastrophic |
Real-World Impact
- Aviation: Late prediction → unscheduled engine shutdown in flight
- Manufacturing: Late prediction → production line failure, costly downtime
- Power generation: Late prediction → turbine damage, outage
Scoring Formula
The NASA asymmetric scoring function uses exponential penalties with different decay rates for early and late predictions.
Definition
For each prediction, the individual score is:
Where is the prediction error.
The total NASA score is the sum over all predictions:
Understanding the Error Direction
| Condition | Error d | Meaning | Penalty Base |
|---|---|---|---|
| d < 0 | ŷ < y | Early prediction | 13 (gentler) |
| d ≥ 0 | ŷ ≥ y | Late prediction | 10 (harsher) |
Key Parameters
The asymmetry is encoded in the exponential bases: 13 for early predictions (gentler penalty) and 10 for late predictions (harsher penalty). These values were chosen by NASA to reflect the relative costs of early vs. late maintenance.
Exponential Growth
The exponential form means penalties grow rapidly with error magnitude:
| Error (cycles) | Early Score (d=-) | Late Score (d=+) |
|---|---|---|
| 5 | 0.47 | 0.65 |
| 10 | 1.16 | 1.72 |
| 15 | 2.17 | 3.48 |
| 20 | 3.62 | 6.39 |
| 25 | 5.60 | 11.18 |
| 30 | 8.22 | 19.09 |
A 30-cycle late prediction scores 19.09, while a 30-cycle early prediction scores only 8.22—more than 2× the penalty for the same magnitude error.
Penalty Analysis
Detailed analysis of the penalty structure reveals important properties.
Perfect Prediction
When (perfect prediction):
Perfect predictions contribute zero to the score.
Penalty Ratio Analysis
Score Sensitivity
The NASA score is highly sensitive to outliers due to its exponential nature:
| Scenario | Individual Score | Contribution to Total |
|---|---|---|
| 99 predictions, d=5 (late) | 99 × 0.65 = 64.4 | Moderate total |
| 1 prediction, d=50 (late) | 1 × 147.4 = 147.4 | Dominates! |
Outlier Sensitivity
A single large error can dominate the NASA score due to exponential growth. This makes the metric volatile—a good model with one bad prediction may score worse than a mediocre model with consistent errors. This is why RMSE (not NASA score) is typically used for model selection.
Implementation
Complete implementation of the NASA scoring function with multiple variants.
Comprehensive Scoring Function
1def nasa_scoring_function_comprehensive(
2 y_true: np.ndarray,
3 y_pred: np.ndarray,
4 method: str = 'paper_style',
5 unit_ids: Optional[np.ndarray] = None,
6 normalize: bool = False
7) -> Dict[str, float]:
8 """
9 NASA asymmetric scoring function with multiple variants.
10
11 Args:
12 y_true: True RUL values
13 y_pred: Predicted RUL values
14 method: Scoring method ('paper_style', 'raw_sum', 'both')
15 unit_ids: Unit IDs for per-unit last-cycle scoring
16 normalize: Whether to normalize by number of predictions
17
18 Returns:
19 Dictionary with comprehensive scoring results
20 """
21 # Ensure predictions and targets are capped at 125 consistently
22 y_pred_capped = np.minimum(y_pred, 125.0)
23 y_true_capped = np.minimum(y_true, 125.0)
24
25 # Compute prediction error
26 d = y_pred_capped - y_true_capped
27
28 # Initialize individual scores
29 scores_individual = np.zeros_like(d)
30
31 # Early prediction (d < 0): gentler penalty with base 13
32 early_mask = d < 0
33 scores_individual[early_mask] = np.exp(-d[early_mask] / 13) - 1
34
35 # Late prediction (d >= 0): harsher penalty with base 10
36 late_mask = d >= 0
37 scores_individual[late_mask] = np.exp(d[late_mask] / 10) - 1
38
39 results = {}
40
41 if method in ['raw_sum', 'both']:
42 # Raw aggregate score (sum over all predictions)
43 raw_score = np.sum(scores_individual)
44 if normalize:
45 raw_score = raw_score / len(scores_individual)
46 results['nasa_score_raw'] = float(raw_score)
47
48 if method in ['paper_style', 'both'] and unit_ids is not None:
49 # Per-unit last-cycle scoring (literature comparable)
50 unique_units = np.unique(unit_ids)
51 unit_scores = []
52
53 for unit_id in unique_units:
54 unit_mask = unit_ids == unit_id
55 unit_scores_all = scores_individual[unit_mask]
56 # Take only the last cycle score for this unit
57 if len(unit_scores_all) > 0:
58 unit_scores.append(unit_scores_all[-1])
59
60 paper_score = np.sum(unit_scores)
61 if normalize:
62 paper_score = paper_score / len(unit_scores)
63 results['nasa_score_paper'] = float(paper_score)
64 results['n_units_evaluated'] = len(unique_units)
65
66 # Diagnostic information
67 results['early_predictions'] = int(np.sum(d < 0))
68 results['late_predictions'] = int(np.sum(d >= 0))
69 results['mean_error'] = float(np.mean(d))
70 results['std_error'] = float(np.std(d))
71 results['early_penalty_avg'] = float(np.mean(scores_individual[early_mask])) if np.any(early_mask) else 0.0
72 results['late_penalty_avg'] = float(np.mean(scores_individual[late_mask])) if np.any(late_mask) else 0.0
73
74 return resultsSimple Scoring Function
1def nasa_scoring_function(
2 y_true: np.ndarray,
3 y_pred: np.ndarray
4) -> float:
5 """
6 Simple NASA scoring function for backward compatibility.
7
8 Computes raw sum over all predictions without per-unit
9 last-cycle extraction.
10
11 Args:
12 y_true: True RUL values
13 y_pred: Predicted RUL values
14
15 Returns:
16 Total NASA score
17 """
18 scores = nasa_scoring_function_comprehensive(
19 y_true, y_pred,
20 method='raw_sum'
21 )
22 return scores.get('nasa_score_raw', 0.0)Usage Example
1# Example evaluation
2predictions = np.array([110, 85, 45, 25, 8])
3targets = np.array([100, 90, 50, 20, 10])
4unit_ids = np.array([1, 1, 1, 1, 1]) # All same engine
5
6# Compute comprehensive scores
7scores = nasa_scoring_function_comprehensive(
8 y_true=targets,
9 y_pred=predictions,
10 method='both',
11 unit_ids=unit_ids
12)
13
14print(f"NASA Score (paper): {scores['nasa_score_paper']:.2f}")
15print(f"NASA Score (raw): {scores['nasa_score_raw']:.2f}")
16print(f"Early predictions: {scores['early_predictions']}")
17print(f"Late predictions: {scores['late_predictions']}")
18print(f"Mean error: {scores['mean_error']:.2f}")
19
20# Output:
21# NASA Score (paper): 0.18 (only last cycle: 8-10=-2)
22# NASA Score (raw): 2.84 (sum over all cycles)
23# Early predictions: 3
24# Late predictions: 2
25# Mean error: -0.40Paper-Style vs. Raw
The "paper-style" NASA score uses only the last prediction per engine (matching published benchmarks). The "raw" score sums over all predictions. For fair comparison with literature, always use paper-style scoring.
Summary
In this section, we covered the NASA asymmetric scoring function:
- Asymmetry rationale: Late predictions are more dangerous than early
- Exponential penalties: Base 13 for early, base 10 for late
- Formula: where a=13 (early) or 10 (late)
- High sensitivity: Single large errors dominate the score
- Paper-style scoring: Uses last-cycle predictions only
| Metric | Use Case | Sensitivity |
|---|---|---|
| RMSE | Model selection | Moderate (quadratic) |
| NASA Score | Final reporting | High (exponential) |
| RMSE | Symmetric errors | Equal penalty |
| NASA Score | Safety-critical | Late penalized more |
Looking Ahead: Both RMSE and NASA score can be computed over all cycles or only the last cycle. The next section examines last-cycle vs. all-cycles evaluation— understanding when and why to use each approach.
With the NASA scoring function understood, we explore evaluation protocols.