Two Doctors, Same RMSE, Different Outcomes
Imagine two diagnosticians estimating how many days a patient has left before a cardiac event. Both are wrong by the same average margin — say, six days. Doctor A's mistakes lean early: she usually says “next week” when the real event is two weeks out. Doctor B's mistakes lean late: he typically says “two weeks” when the patient codes in seven days. On any symmetric error metric the two doctors look identical. In the ICU they are not remotely the same person.
That story is the entire reason this section exists. The dominant accuracy metric in regression is RMSE, and RMSE cannot tell A from B. The maintenance community noticed that almost twenty years ago — the result is the NASA scoring function, an exponential, asymmetric cost that bakes into the metric the asymmetry that exists in reality.
RMSE: The Comfortable but Symmetric Metric
RMSE is what every regression paper reports because it has every property a metric should have: same units as the target, differentiable everywhere, cheap to compute, and well-understood. Given predictions and ground-truth labels over a test set of size ,
The single property that ruins RMSE for safety-critical RUL is the square. Squaring erases the sign of the error: an error of cycles and an error of cycles contribute identically.
The NASA Score: Asymmetry Made Quantitative
Saxena, Goebel and colleagues introduced the asymmetric scoring function alongside the original C-MAPSS dataset in 2008. For a single test sample, with ,
The total score over the test set is the sum . Lower is safer, zero is perfect, and there is no upper bound — one really late prediction can dominate the entire sum.
Two design decisions encode the safety priorities. First, the denominators differ: 13 on the early side, 10 on the late side. Every 13 cycles of earliness multiplies the cost by ; every 10 cycles of lateness does the same. Lateness compounds 30% faster than earliness. Second, the function is exponential, not linear or quadratic — doubling the lateness more than doubles the penalty. A 30-cycle late prediction is worth nineteen 1-cycle late predictions.
| error d | NASA score s(d) | Interpretation |
|---|---|---|
| -30 | 9.05 | Very early - 30 cycles of wasted life, modest penalty |
| -10 | 1.16 | Early - small penalty |
| -5 | 0.47 | Slightly early - almost free |
| 0 | 0.00 | Perfect |
| +5 | 0.65 | Slightly late - already comparable to 5 cycles early |
| +10 | 1.72 | Late - 50% worse than the same-magnitude early error |
| +30 | 19.09 | Very late - over 2x the cost of the same-magnitude early |
The math, decoded
The piecewise formula in the original paper is given in shorthand — no sign condition is written next to it. Make it explicit with :
- — early prediction: the model says fewer cycles remain than the truth. Conservative. Maintenance is scheduled too soon and life is wasted.
- — late prediction: the model says more cycles remain than the truth. Dangerous. The engine may be near failure while the model claims plenty of life left.
Worked example: same |d|, two penalties
Take a single engine with true cycles and consider two predictions that are wrong by the same 20 cycles in opposite directions.
| Prediction | d = predicted − actual | Branch | s(d) = | Value |
|---|---|---|---|---|
| predicted = 0 | 0 − 20 = −20 | early | ≈ 3.66 | |
| predicted = 40 | 40 − 20 = +20 | late | ≈ 6.39 |
Same absolute error of 20 cycles, but the late penalty is ~75% larger than the early one. RMSE would call these equally bad — both contribute to the squared error. The NASA score refuses to.
Engineering picture. A model saying “this engine still has 40 cycles left” when it actually has only 20 is dangerous — maintenance arrives too late. A model saying “0 cycles left” when the truth is 20 is conservative and costly, but safer. The asymmetric score encodes that asymmetry of consequences.
Why 10 and 13 — the role of the denominators
Rewrite both branches in terms of :
For any non-zero error, the late branch divides by the smaller number, which produces a larger exponent — and exponentials magnify any advantage in the exponent very quickly:
With : vs , so vs . The denominators are tuning knobs: 13 = softer growth on the safe side, 10 = sharper growth on the dangerous side. The paper does not derive these constants from first principles — they are the convention the C-MAPSS leaderboard has used since 2008 — but their role is precise: make every cycle of lateness 30% more painful than the equivalent cycle of earliness.
Interactive: Watch RMSE Lie
Below is the asymmetric scoring curve, plus a tiny synthetic model controlled by two sliders — bias and noise std. Slide the bias to +10 and the bottom-right scatter goes red. Slide it to -10 and the same magnitude of error suddenly looks safe. Watch RMSE move slightly while NASA score moves a lot.
The asymmetry is not subtle. With noise, a +6 mean bias roughly doubles the NASA score relative to a 0 mean bias, while RMSE only worsens by 28%. Build a loss function out of RMSE and your gradient does not know that doubling.
Python: RMSE and NASA, Side by Side
Two implementations of NASA score and RMSE; three synthetic models with the same precision but different bias. The NumPy run below produces the exact values quoted in tables throughout this book.
What the numbers say
The unbiased model A has the lowest RMSE (8.08) and the lowest NASA (99.3). Add a +6-cycle bias and you get model B, with worse RMSE (10.34) and much-worse NASA (180.1). Now flip the sign: the early-biased model C ends up with similar RMSE to B (9.77) but its NASA score is 38% lower (111.4) — because all of C's errors fell on the “safe” side of the curve.
That gap between B and C is the entire research thesis of this book in one number. Two models with comparable RMSE; one is dangerous, the other is safe. The difference is invisible to the metric most papers report.
PyTorch: A Differentiable NASA-Style Loss
For training we need the asymmetric cost as a differentiable . The shape stays the same; the implementation must use torch.exp and torch.where so autograd can backpropagate through both branches.
torch.where and not a Python if? An if d < 0 in forward() would short-circuit one branch on a per-sample basis, breaking vectorisation and (more importantly) breaking the backward pass. torch.where evaluates both branches and selects element-wise — autograd sees a smooth, batched, GPU-parallel computation.Asymmetric Cost in Other Domains
The pattern “errors in one direction are categorically worse than the other” is everywhere once you start looking.
| Domain | Cheap direction | Expensive direction | Closest analogue to NASA score |
|---|---|---|---|
| Aviation RUL (this book) | Predict too early - wasted life | Predict too late - engine fails | NASA exp scoring (Saxena et al. 2008) |
| Insurance reserves | Over-reserve - opportunity cost | Under-reserve - solvency event | Asymmetric quantile / pinball loss |
| Hospital triage | Admit a stable patient - billing | Discharge a deteriorating one - death | Cost-sensitive learning matrices |
| Autonomous-vehicle braking | Brake too early - passenger discomfort | Brake too late - collision | Time-to-collision threshold |
| Cancer screening | False positive - biopsy | False negative - undetected disease | Weighted F-beta with beta > 1 |
| Battery state-of-charge | Range-anxiety - under-report | Stranded vehicle - over-report | Quantile + range guard-band |
| Server-capacity prediction | Over-provision - cloud bill | Under-provision - outage | Asymmetric MSE in SRE forecasting |
The mathematical machinery in this book — an asymmetric loss, a gradient-balancing controller, a Pareto-frontier picture — transfers line-for-line to any of these settings.
The Trap: Optimising the Wrong Metric
It is exactly this trap that motivates the failure-biased weighted MSE in Chapter 14. By up-weighting near-failure samples we change the gradient landscape so the model pays disproportionate attention to the regime where late predictions are most expensive — without abandoning the smoothness of MSE that makes deep nets train.
The deep observation. Once you accept that the cost is asymmetric, all three of the proposed objectives in this book (AMNL, GABA, GRACE) follow as different ways to embed that asymmetry into a multi-task gradient.
Takeaway
- RMSE is symmetric; reality is not. A late RUL prediction costs categorically more than an equally-wrong early one. RMSE cannot tell you which side of zero your errors live on.
- The NASA score makes that asymmetry quantitative. for early errors and for late ones. Small denominator, exponential growth: lateness compounds fast.
- Two models with the same RMSE can have very different safety profiles. Our late-biased model B beats the early-biased model C on RMSE but loses badly on NASA score (180 vs 111).
- The PyTorch loss is a four-line module.
torch.whereturns the piecewise definition into a fully differentiable, batched, GPU-friendly forward pass. - This is what the rest of the book is about. AMNL, GABA, and GRACE are three different engineering answers to the same question: how do we get a model whose gradient knows that late predictions are dangerous?