Chapter 13
15 min read
Section 51 of 121

NASA Score: The Asymmetric Cost of Lateness

The Accuracy-Safety Tradeoff

Early vs Late: Two Different Failures

Predict an engine's RUL too early and you replace a part with life left in it - cost a few thousand dollars, ten or twenty cycles of wasted life, a bit of unplanned tear-down. Predict it too late and you miss the failure - cost a few hundred thousand dollars, an unplanned outage, possible cascade damage, possibly a safety incident. Same magnitude of error; very different cost.

Operational rule of thumb (legacy chapter 11.2). Late predictions can be 10-100× more costly than early predictions of the same magnitude. The loss function should encode that asymmetry, not pretend the two are equivalent.
Error signMeaningOperational consequenceTypical $$ scale
d < 0predicted RUL < true RULearly/premature replacement$5K - $20K (part + labour)
d ≈ 0near-perfect predictionideal maintenance timingminimal
d > 0predicted RUL > true RULlate - failure missed$100K - $1M+ (downtime + cascade)

RMSE - which is the headline metric in nearly every RUL paper - treats d=20d = -20 and d=+20d = +20 as identical. The NASA scoring function does not. This section is about that scoring function.

The NASA Scoring Function

For prediction error di=y^iyid_i = \hat{y}_i - y_i on engine i, the NASA C-MAPSS score is

si={exp(di/13)1di<0 (early, mild)exp(di/10)1di0 (late, harsh)s_i = \begin{cases} \exp(-d_i/13) - 1 & d_i < 0 \ (\text{early, mild}) \\ \exp(\,d_i/10) - 1 & d_i \geq 0 \ (\text{late, harsh}) \end{cases}

The total score for a test set is S=i=1NsiS = \sum_{i=1}^{N} s_i; lower is better. The two decay constants a1=13a_1 = 13 (early) and a2=10a_2 = 10 (late) come from the original PHM 2008 challenge brief and have been the standard ever since.

Asymmetry comes from the EXPONENTIAL, not just the ratio. The ratio a1/a2=1.3a_1/a_2 = 1.3 looks mild, but exp() amplifies any gap. At d=20|d| = 20 the cost ratio is 8×. At d=30|d| = 30 it is 21×. Do not eyeball asymmetry by the decay-constant ratio; always plot the curve.

How Lopsided Is It?

Side-by-side cost for symmetric errors ±d\pm |d|:

|d| (cycles)early s(-d)late s(+d)late / early ratio
50.4670.6491.4×
101.1511.7181.5×
152.1503.4821.6×
203.6546.3891.7×
255.91311.1821.9×
309.30019.0862.1×
4020.62453.5982.6×
5045.001147.4133.3×
Read this table once, never forget it. A 20-cycle late prediction costs the same as a 25-cycle early prediction. A 30-cycle late prediction costs the same as a 40-cycle early prediction. The model should err early - and the loss function should make that easy.

Interactive: Cost vs Prediction Error

Drag the bias and noise knobs. Bias slides the entire prediction distribution; noise spreads it. Watch how a small POSITIVE bias (1-2 cycles late on average) explodes the score, while a similar NEGATIVE bias (early) barely moves it.

Loading NASA score explorer…
Try this. Set bias = 0, sigma = 5. Note the score. Now set bias = +5, sigma = 0 (always 5 late, no noise). Score is HIGHER than the noisy-but-unbiased version, even though the average error magnitude is smaller. Lateness is the feature, not magnitude.

Python: NASA Score from Scratch

Pure NumPy implementation - five lines of math, the rest is bookkeeping. Worked example: five engines with errors ±20,±5,0\pm 20, \pm 5, 0 demonstrates the cost asymmetry.

Per-sample asymmetric cost via np.where
🐍nasa_score_numpy.py
1import numpy as np

NumPy is the numerical workhorse. We use ndarray for vectorised operations on (B,) error vectors, np.exp for the exponential cost shapes, np.where for the piecewise branch, and the .sum / .mean reductions to roll up a per-batch number.

EXECUTION STATE
📚 numpy = Library: ndarray, broadcasting, math, linear algebra. Foundation of every Python ML stack.
as np = Universal alias - lets us write np.exp, np.where, np.sum as one-token names.
4def nasa_score(y_pred, y_true, a1=13.0, a2=10.0) -> dict:

Compute NASA C-MAPSS asymmetric score for a batch of predictions. The asymmetry between a1=13 (early decay) and a2=10 (late decay) means a +20-cycle late prediction costs roughly 8× more than a -20-cycle early prediction.

EXECUTION STATE
⬇ input: y_pred = (B,) - predicted RUL per engine. C-MAPSS RUL is in {0, …, 125}; predictions can fall anywhere.
⬇ input: y_true = (B,) - ground-truth RUL per engine. Capped at R_max=125 from §7.2.
⬇ input: a1 = 13.0 = Early decay constant. Bigger a1 = gentler penalty for early predictions. NASA chose 13 to encourage a small safety margin without wasting too much remaining life.
⬇ input: a2 = 10.0 = Late decay constant. Smaller a2 = steeper penalty for late predictions. The ratio a1/a2 = 1.3 is the headline asymmetry, but the EXPONENTIAL form makes the gap explode at large errors.
⬆ returns = dict with four entries: per-sample errors and scores, total, and percentage of late samples. Ready for plotting / logging.
13d = y_pred - y_true

Element-wise signed error. NumPy broadcasts on matching (B,) shapes. Sign convention: d &lt; 0 means we predicted EARLIER than truth (we said the engine had less life left than it really does); d &gt; 0 means we predicted LATER.

EXECUTION STATE
operator: - = Element-wise subtraction.
→ reading d = d = predicted RUL minus actual RUL. d &lt; 0 ⇒ &quot;we said it would die sooner&quot; ⇒ early / conservative. d &gt; 0 ⇒ &quot;we said it would last longer&quot; ⇒ late / dangerous.
⬆ result: d (worked example) = [-20., -5., 0., 5., 20.]
14is_late = d >= 0

Boolean mask. True where d is non-negative (late or perfect). Used by np.where below to pick the right cost branch per element.

EXECUTION STATE
operator: &gt;= = Element-wise comparison. Returns a NumPy bool array of the same shape.
→ boundary = We treat d == 0 as &quot;late&quot; even though both branches give 0 there - exp(0) - 1 = 0. So the branch choice is irrelevant exactly at the boundary.
⬆ result: is_late (worked example) = [False, False, True, True, True]
15s_early = np.exp(-d / a1) - 1.0

Cost for the EARLY branch. We compute it for every element (cheap), and use np.where below to keep it only where is_late is False. The minus sign in front of d turns negative d into positive exponents - the bigger the early error, the larger exp(-d/a1).

EXECUTION STATE
📚 np.exp(arr) = Element-wise e^x. exp(0) = 1, exp(1) ≈ 2.718, exp(-1) ≈ 0.368. Used here so that the cost grows smoothly with |d|.
operator: -d / a1 = For d = -20, a1 = 13: -d/a1 = 20/13 ≈ 1.538 ⇒ exp(1.538) ≈ 4.654 ⇒ s_early ≈ 3.654. So a 20-cycle EARLY error costs ~3.65 NASA points.
operator: - 1.0 = Subtract 1 so a perfect prediction (d=0) yields cost 0. Without it, perfect prediction would cost 1 - useless reference.
⬆ result: s_early (worked example) = [3.654, 0.469, 0.000, -0.393, -0.785]
→ note = s_early is negative for d &gt; 0 - that branch is BOGUS for late samples. np.where below masks it out.
16s_late = np.exp(d / a2) - 1.0

Cost for the LATE branch. Steeper than early because a2=10 &lt; a1=13. This is what makes the score asymmetric.

EXECUTION STATE
operator: d / a2 = For d = +20, a2 = 10: d/a2 = 2.0 ⇒ exp(2) ≈ 7.389 ⇒ s_late ≈ 6.389. Compared to a 20-cycle early error (3.65), the late error is 1.75× worse element-wise.
→ why a2 &lt; a1? = Late predictions miss the failure - in production this is unplanned downtime, possible cascade damage, possible safety incident. A smaller decay constant amplifies this in the cost.
⬆ result: s_late (worked example) = [-0.865, -0.393, 0.000, 0.649, 6.389]
17s = np.where(is_late, s_late, s_early)

Pick the right cost branch element-wise. np.where(cond, a, b) is the vectorised analogue of `cond ? a : b` in C-style languages.

EXECUTION STATE
📚 np.where(cond, a, b) = Element-wise ternary. Returns a[i] where cond[i] is True, else b[i]. Inputs broadcast to a common shape.
⬇ arg 1: cond = is_late = Boolean mask. True ⇒ pick s_late, False ⇒ pick s_early.
⬇ arg 2: a = s_late = The late cost array.
⬇ arg 3: b = s_early = The early cost array.
⬆ result: s (worked example) = [3.654, 0.469, 0.000, 0.649, 6.389]
→ asymmetry visible = Compare element 0 (d=-20, cost 3.65) with element 4 (d=+20, cost 6.39). Same magnitude error, 75% bigger cost on the late side. At |d|=30 the gap balloons to 21×.
19total = s.sum()

Sum of per-sample costs. NASA reports this as the &lsquo;score&rsquo; for an entire test set - lower is better.

EXECUTION STATE
📚 .sum() = ndarray method. With no axis, reduces over all elements to a 0-D scalar.
⬆ result: total (worked example) = 11.161
→ context = On C-MAPSS FD001 test (100 engines), the published baseline (CNN-LSTM) scores ~340. The book&apos;s GRACE model scores ~228. Lower = better.
20pct_late = is_late.mean() * 100.0

Diagnostic - what fraction of predictions were late? .mean() on a Boolean mask returns the fraction True.

EXECUTION STATE
📚 .mean() on bool = Treats False as 0, True as 1. Returns a float in [0, 1] = fraction True.
⬆ result: pct_late (worked example) = 60.0
→ why track it? = If pct_late &gt; 50% the model is dangerously biased toward late predictions. A trained AMNL/GRACE model typically lands at 30-40% late.
21return { ... }

Hand back four values in a dict so callers can unpack just the parts they need.

EXECUTION STATE
⬆ return key: errors = (B,) ndarray of signed errors d.
⬆ return key: scores = (B,) ndarray of per-sample costs.
⬆ return key: total = Python float - the sum reported by NASA.
⬆ return key: pct_late = Python float - percentage of late samples.
29y_true = np.array([50, 50, 50, 50, 50], dtype=np.float32)

Worked example with all five engines having true RUL = 50 cycles. Holding y_true constant lets us see the cost asymmetry purely as a function of error sign.

EXECUTION STATE
📚 np.array(seq, dtype) = Construct an ndarray from a Python sequence with an explicit dtype.
⬇ arg: dtype = np.float32 = Match the model output dtype - mixing dtypes is a silent bug source.
⬆ result: y_true = [50., 50., 50., 50., 50.] (B=5)
30y_pred = np.array([30, 45, 50, 55, 70], dtype=np.float32)

Five hand-picked predictions: very early (-20), slightly early (-5), perfect (0), slightly late (+5), very late (+20). The pattern lets us read the asymmetry off the printed scores immediately.

EXECUTION STATE
⬆ result: y_pred = [30., 45., 50., 55., 70.]
→ expected errors = y_pred - y_true = [-20, -5, 0, +5, +20]. Symmetric around 0 by construction.
31out = nasa_score(y_pred, y_true)

Run the scorer with default a1=13, a2=10.

EXECUTION STATE
⬇ args used = y_pred, y_true (defaults a1=13, a2=10).
⬆ result: out = dict with errors / scores / total / pct_late.
33print("errors :", out["errors"].tolist())

Print the error vector. .tolist() converts an ndarray to a Python list for cleaner printing.

EXECUTION STATE
📚 .tolist() = Materialise an ndarray as a (possibly nested) Python list. Avoids NumPy&apos;s &lsquo;array(…)&rsquo; wrapper in the output.
Output = errors : [-20.0, -5.0, 0.0, 5.0, 20.0]
34print("scores :", out["scores"].round(3).tolist())

Print per-sample costs rounded to 3 decimals. Notice how |error|=20 maps to 3.654 on the early side but 6.389 on the late side.

EXECUTION STATE
📚 .round(decimals) = Element-wise rounding. Returns a new ndarray; the original is unchanged.
⬇ arg: decimals = 3 = Round to 3 decimal places.
Output = scores : [3.654, 0.469, 0.0, 0.649, 6.389]
→ asymmetry on display = Same |error|=20: early cost 3.654, late cost 6.389. Late is 75% worse for the same magnitude. At |error|=30 the gap is 21×.
35print("total :", round(out["total"], 3))

Sum of per-sample costs.

EXECUTION STATE
📚 round(x, ndigits) = Python built-in. Returns a float rounded to ndigits decimals.
Output = total : 11.161
36print("% late :", round(out["pct_late"], 1), "%")

Diagnostic. 60% late ⇒ this synthetic model leans dangerous.

EXECUTION STATE
Output = % late : 60.0 %
23 lines without explanation
1import numpy as np
2
3
4def nasa_score(y_pred: np.ndarray,
5                y_true: np.ndarray,
6                a1: float = 13.0,
7                a2: float = 10.0) -> dict:
8    """NASA C-MAPSS asymmetric scoring function.
9
10    s(d) = exp(-d / a1) - 1   if d <  0   (early - mild penalty)
11    s(d) = exp( d / a2) - 1   if d >= 0   (late  - harsh penalty)
12    where d = y_pred - y_true.
13
14    Returns the per-sample scores, the total, and a few headline stats.
15    """
16    d = y_pred - y_true                                 # signed errors
17    is_late  = d >= 0
18    s_early  = np.exp(-d / a1) - 1.0                    # valid where d < 0
19    s_late   = np.exp( d / a2) - 1.0                    # valid where d >= 0
20    s        = np.where(is_late, s_late, s_early)       # piecewise
21
22    total    = s.sum()
23    pct_late = is_late.mean() * 100.0
24    return {
25        "errors":   d,
26        "scores":   s,
27        "total":    float(total),
28        "pct_late": float(pct_late),
29    }
30
31
32# ---------- Worked example: 5 engines, mixed errors ----------
33y_true  = np.array([ 50,  50,  50,  50,  50], dtype=np.float32)   # all true RUL = 50
34y_pred  = np.array([ 30,  45,  50,  55,  70], dtype=np.float32)   # -20, -5, 0, +5, +20
35out     = nasa_score(y_pred, y_true)
36
37print("errors :", out["errors"].tolist())               # [-20, -5,  0, +5, +20]
38print("scores :", out["scores"].round(3).tolist())      # asymmetric!
39print("total  :", round(out["total"], 3))
40print("% late :", round(out["pct_late"], 1), "%")

PyTorch: Differentiable NASA Loss

Production version. NASAScoreLoss as an nn.Module with torch.where for the piecewise cost, plus a clip_error guard against gradient explosion. Same numerical answer as the NumPy block.

Drop-in nn.Module with autograd-verified gradients
🐍nasa_score_loss_torch.py
1import torch

Top-level PyTorch.

EXECUTION STATE
📚 torch = Tensor library + autograd + nn modules + optim.
2import torch.nn as nn

Module containers and Parameter.

EXECUTION STATE
📚 nn.Module = Base class for all PyTorch models. Provides parameter registration, .train()/.eval(), state_dict(), and the call → forward dispatch.
3import torch.nn.functional as F

Stateless ops. Not used directly here but conventional in any production training file.

6class NASAScoreLoss(nn.Module):

Custom loss as an nn.Module. The Module convention - rather than a plain function - lets us register hyperparameters with the optimiser, ship the loss in state_dict, and use it interchangeably with nn.MSELoss / nn.CrossEntropyLoss in higher-level code.

14def __init__(self, a1=13.0, a2=10.0, clip_error=50.0):

Three hyperparameters: the two NASA decay constants plus a |d| clip used to keep gradients finite during training.

EXECUTION STATE
⬇ input: a1 = 13.0 = Early decay constant. NASA-canonical value.
⬇ input: a2 = 10.0 = Late decay constant. NASA-canonical value.
⬇ input: clip_error = 50.0 = Maximum |d| before the exponential is clipped. Without it, a single d=+125 sample produces exp(12.5) ≈ 268,337 and the gradient explodes. clip_error=50 caps the gradient at exp(5)/10 ≈ 14.84 per element - large but trainable.
18super().__init__()

Initialise nn.Module - sets up parameter / buffer registries even though we will not register any here.

19self.a1 = a1

Store as a Python float on the module. Not registered as a Parameter (we do not want it learnable) and not as a buffer (no need to ship it in state_dict).

20self.a2 = a2

Same.

21self.clip_error = clip_error

Same.

23def forward(self, pred, target) -> torch.Tensor:

Compute mean NASA score. Same call signature as nn.MSELoss for drop-in use.

EXECUTION STATE
⬇ input: pred = (B,) predicted RUL. requires_grad=True so the surrogate&apos;s gradient flows back through the model.
⬇ input: target = (B,) ground-truth RUL. No grad - it&apos;s data.
⬆ returns = 0-D tensor (scalar) - the mean per-sample NASA cost. Connected to pred via autograd.
26d = (pred - target).clamp(-self.clip_error, self.clip_error)

Compute the signed error and clamp it. Method-chained for readability. The clamp is what lets us train without gradient explosion; for evaluation we&apos;d skip it (or equivalently set clip_error = inf).

EXECUTION STATE
operator: - = Element-wise tensor subtraction.
📚 .clamp(min, max) = Element-wise clip. Returns max(min, min(x, max)) per element. Differentiable - gradient is 1 inside the range, 0 outside.
⬇ arg 1: min = -clip = Lower bound. Errors below -50 saturate at -50.
⬇ arg 2: max = +clip = Upper bound. Errors above +50 saturate at +50.
→ why bound? = exp(50/10) = exp(5) ≈ 148. exp(125/10) ≈ 268,337. The unbounded form crashes the optimiser within the first epoch.
⬆ result: d (worked example) = [-20., -5., 0., 5., 20.] (no clipping kicked in, |d| ≤ 50 everywhere)
29s = torch.where(d >= 0, torch.exp(d / self.a2) - 1.0, torch.exp(-d / self.a1) - 1.0)

Piecewise cost via torch.where. PyTorch evaluates BOTH branches and selects per element - this is fine here because both branches are smooth and finite (after the clamp).

EXECUTION STATE
📚 torch.where(cond, a, b) = Element-wise ternary. Returns a[i] if cond[i] else b[i]. Differentiable in BOTH branches - gradient flows through the chosen branch only.
⬇ arg 1: cond = (d &gt;= 0) = Boolean tensor mask. True for late predictions, False for early.
⬇ arg 2: a = exp(d / a2) - 1 = Late branch. For d=+20, a2=10: exp(2) - 1 ≈ 6.389.
⬇ arg 3: b = exp(-d / a1) - 1 = Early branch. For d=-20, a1=13: exp(20/13) - 1 ≈ 3.654.
📚 torch.exp(t) = Element-wise e^x. Differentiable: d(exp(x))/dx = exp(x).
→ why both branches evaluated? = torch.where computes BOTH a and b and selects per element. There is no short-circuit. The wasted branch is cheap (a single exp + sub).
⬆ result: s (worked example) = tensor([3.654, 0.469, 0.000, 0.649, 6.389])
34return s.mean()

Reduce to a scalar. Mean (not sum) so the gradient magnitude is independent of batch size - matches the default reduction of nn.MSELoss.

EXECUTION STATE
📚 .mean() = Tensor method. With no dim, reduces over all elements to a 0-D scalar.
⬆ return = 0-D tensor with grad_fn. Worked example: tensor(2.2322).
→ 11.161/5 = = 2.2322. Matches the NumPy total / 5 ✓.
39torch.manual_seed(0)

Repro - irrelevant here (no random ops) but conventional.

EXECUTION STATE
📚 torch.manual_seed(s) = Sets the global PyTorch PRNG.
⬇ arg: s = 0 = Conventional canonical seed.
40loss_fn = NASAScoreLoss(a1=13.0, a2=10.0, clip_error=50.0)

Instantiate. Same hyperparameters as the NumPy block.

EXECUTION STATE
⬇ args used = All three at NASA-canonical defaults.
⬆ result: loss_fn = An NASAScoreLoss instance, callable like loss_fn(pred, target).
42target = torch.tensor([50., 50., 50., 50., 50.])

Same hand-picked y_true as the NumPy block.

EXECUTION STATE
📚 torch.tensor(seq) = Construct a new tensor from a Python sequence. Default float dtype (float32 unless globally overridden).
⬆ result: target = tensor([50., 50., 50., 50., 50.]) shape (5,)
43pred = torch.tensor([30., 45., 50., 55., 70.], requires_grad=True)

Predictions tagged for autograd. requires_grad=True is what lets .backward() populate pred.grad below.

EXECUTION STATE
⬇ arg: requires_grad = True = Tells autograd to track operations on pred so we can call .backward() on the loss.
⬆ result: pred = tensor([30., 45., 50., 55., 70.], requires_grad=True)
45loss = loss_fn(pred, target)

Calls loss_fn.__call__ which dispatches to forward(). Returns a 0-D tensor connected to pred via autograd.

EXECUTION STATE
⬆ result: loss = tensor(2.2322, grad_fn=<MeanBackward0>)
→ matches NumPy = 11.161 / 5 = 2.2322 ✓
46loss.backward()

Reverse-mode autograd. Populates pred.grad with d(loss)/d(pred). For NASA score, the per-element grad is +(1/a1) exp(-d/a1) on the early branch and +(1/a2) exp(d/a2) on the late branch.

EXECUTION STATE
📚 .backward(retain_graph=False) = Backprops through the autograd graph and accumulates grads into all leaves with requires_grad=True. Default frees the graph.
48print("loss :", round(loss.item(), 4))

Pull the Python float out of the 0-D loss tensor.

EXECUTION STATE
📚 .item() = 0-D tensor → Python float. Crashes on multi-element tensors.
Output = loss : 2.2322
49print("pred.grad :", pred.grad.round(decimals=4).tolist())

Per-sample gradient. With reduction=mean, each element gets divided by B=5.

EXECUTION STATE
📚 .round(decimals) = Element-wise rounding. PyTorch ≥ 1.12.
📚 .tolist() = Tensor → Python (nested) list.
Output = pred.grad : [-0.0716, -0.1066, 0.0200, 0.1107, 0.7389]
→ reading the gradients = Element 0 (d=-20, early): -1/(13·5) · exp(20/13) ≈ -0.0716. Element 4 (d=+20, late): 1/(10·5) · exp(2) ≈ 0.7389. Same |d|, gradient 10.3× bigger on the late side.
→ optimiser effect = Adam applies these element-wise. The late samples will pull the model harder toward predicting smaller RUL - exactly the &quot;be conservative&quot; behaviour the NASA score asks for.
32 lines without explanation
1import torch
2import torch.nn as nn
3import torch.nn.functional as F
4
5
6class NASAScoreLoss(nn.Module):
7    """Differentiable surrogate of the NASA C-MAPSS scoring function.
8
9    Forward computes the mean per-sample exponential cost. clip_error
10    bounds |d| to keep gradients finite at extreme errors (the true
11    NASA score has no clip and explodes; clipping is standard for
12    training-time use).
13    """
14
15    def __init__(self,
16                 a1:          float = 13.0,
17                 a2:          float = 10.0,
18                 clip_error:  float = 50.0):
19        super().__init__()
20        self.a1         = a1
21        self.a2         = a2
22        self.clip_error = clip_error
23
24    def forward(self,
25                pred:   torch.Tensor,
26                target: torch.Tensor) -> torch.Tensor:
27        # 1) signed error and clip
28        d = (pred - target).clamp(-self.clip_error, self.clip_error)
29
30        # 2) piecewise cost via torch.where
31        s = torch.where(
32            d >= 0,
33            torch.exp( d / self.a2) - 1.0,             # late
34            torch.exp(-d / self.a1) - 1.0,             # early
35        )
36        return s.mean()
37
38
39# ---------- Smoke test (matches the NumPy block above) ----------
40torch.manual_seed(0)
41loss_fn = NASAScoreLoss(a1=13.0, a2=10.0, clip_error=50.0)
42
43target  = torch.tensor([50., 50., 50., 50., 50.])
44pred    = torch.tensor([30., 45., 50., 55., 70.], requires_grad=True)
45
46loss    = loss_fn(pred, target)
47loss.backward()
48
49print("loss        :", round(loss.item(), 4))         # mean of [3.654, 0.469, 0, 0.649, 6.389] / 5
50print("pred.grad   :", pred.grad.round(decimals=4).tolist())
51# grads: early branch d(-exp(-d/13))/dd = +(1/13) exp(-d/13)
52#        late  branch d( exp( d/10))/dd = +(1/10) exp( d/10)
53# Late gradient is ~1.7x bigger element-for-element at the same |d|.

Asymmetric Cost in Other Domains

Asymmetric loss is not a C-MAPSS-only idea - any safety-critical decision with different costs for over- vs under-prediction wants something like NASA score.

DomainUnderestimateOverestimateAsymmetry constants
RUL prediction (C-MAPSS)early replacement ($5K)missed failure ($1M+)a1=13, a2=10 (NASA)
Wildfire risk scorefalse alarm (annoyance)missed wildfire ($10M+)a1=20, a2=5 (typical agency)
Battery SoC for EV rangeover-conservative rangestranding the drivera1=15, a2=4 (OEM)
Hospital ICU triage scoreextra observation hourmissed deteriorationa1=25, a2=2 (clinical)
Inventory days-to-stockoutearly reorder (carrying cost)stockout (lost sale + brand)a1=10, a2=4 (retail)
Power-grid load forecastspot-buy more capacityblackouta1=12, a2=3 (TSO)
Choosing a1 and a2. The ratio matters more than the absolute values - it sets the asymmetry. The absolute scale just multiplies the loss (and is absorbed into the optimiser's effective learning rate). Pick a1, a2 such that a1/a2 matches your operational cost ratio at |d| = 1 cycle / day / hour.

Three NASA-Score Pitfalls

Pitfall 1: Forgetting to clip during training. At init the model can predict +125 against a true RUL of 0. The unbounded NASA loss returns exp(12.5) ≈ 268,337 with a gradient of ~26,800. Adam absorbs this by inflating its v\sqrt{v} denominator, but plain SGD diverges in one step. Always clamp(-clip_error, +clip_error) the residual. Evaluation on a finite test set has no such risk.
Pitfall 2: Using NASA score as the only metric. NASA score rewards LATE-bias avoidance. A model that always predicts 0 cycles RUL gets a great NASA score (low score on well-calibrated tail engines, low score on the rest because the residual is bounded by R_max). Always report RMSE alongside NASA. §13.2 plots them as a Pareto frontier.
Pitfall 3: Dropping the −1 offset. Some implementations write exp(d/a) - 1 as just exp(d/a). That makes a perfect prediction cost 1 instead of 0 - a constant offset of +N per evaluation, but a non-zero gradient at d=0 during training. Subtle bug; the model ends up biased early because the gradient at d=0 is positive, pushing predictions DOWN.
The point. RMSE is a stand-in convenience metric; NASA score is the one operators care about. Section §13.2 shows the Pareto frontier between them; §13.3 reports published baselines on both metrics; §13.4 closes the chapter with the operator-cost framing.

Takeaway

  • Lateness is the feature, not magnitude. NASA s(d) penalises late predictions ~1.5-3× more than early predictions of the same magnitude.
  • Two decay constants. a1=13a_1 = 13 (early), a2=10a_2 = 10 (late). Standard since PHM 2008.
  • Differentiable surrogate. torch.where(d >= 0, exp(d/a2)-1, exp(-d/a1)-1) + clamp on |d| = a working PyTorch loss.
  • Always clip when training. Unbounded exp() is a gradient bomb at init.
  • Always report RMSE alongside. NASA alone is gameable by all-zero predictions.
Loading comments...