Chapter 13
15 min read
Section 52 of 121

Visualizing the RMSE-NASA Pareto Frontier

The Accuracy-Safety Tradeoff

No Free Lunch

§13.1 introduced the asymmetric NASA score. RMSE is its symmetric cousin - the metric every RUL paper headlines. Most of the time the two agree on which model is better. Sometimes they don't. The places where they disagree are the most interesting cases in the literature - the legacy book's C-MAPSS ablation (Ch 16) shows two of the four subsets where improving RMSE actually worsens NASA score.

The headline. Improving RMSE by tightening predictions does not automatically improve NASA score. NASA rewards bias-toward-early. Tightening uncertain predictions can move them ACROSS the d=0 line, turning a slightly-early model into a slightly-late one. This is why FD001 and FD003 in the legacy ablation shipped “better RMSE but worse NASA” in the same column.

Domination, Frontier, Hypervolume

Let mi=(RMSEi,NASAi)m_i = (\text{RMSE}_i, \text{NASA}_i) be the two-axis cost of model i. We say mjm_j dominates mim_i if mjmim_j \le m_i on both axes AND mj<mim_j < m_i on at least one. The Pareto frontier is the set of models that no other model dominates.

On the frontier, you cannot improve one metric without worsening the other. Off the frontier, you have free improvement available - some other architecture / loss / weighting strictly dominates yours. The whole point of an adaptive multi-task method (AMNL/GABA/GRACE) is to push the achievable frontier inward toward the origin.

TermDefinitionUsed for
dominancej ≤ i on both, j &lt; i on at least oneYes/no question per pair
Pareto frontierset of non-dominated pointsReportable model family
hypervolumearea between frontier and a fixed reference pointSingle-number frontier quality
nadir(max RMSE, max NASA) cornerReference point for hypervolume
Lower-is-better convention. Both axes are costs, so “dominates” uses ≤ and <, not ≥ and >. If you ever switch to a benefit metric (like accuracy) on one axis, flip the inequalities for that axis.

Interactive: Walk the Frontier

Drag the sliders. λ controls how much “early bias” the synthetic model has - 0 is unbiased, 1 is maximally early. α controls prediction tightness. Watch the red dot move; it traces out the Pareto frontier. Green dots are real-data reference points.

Loading Pareto frontier explorer…
Try this. Start at λ=0, α=1.5 - high RMSE, moderate NASA. Slide λ to the right - both metrics drop, but NASA drops faster. Past λ ≈ 0.7 RMSE creeps back up while NASA keeps dropping - that is where the frontier inflects. Real models live near this knee.

Real C-MAPSS Numbers

From the legacy book's ablation (Ch 16): equal-weighted AMNL (loss weight 0.5/0.5) vs RUL-focused baseline (V7, weight 0.75/0.25). Same architecture; only the loss weight differs. Notice that RMSE improves on every subset, but NASA score sometimes does not.

DatasetRMSE: V7 → AMNLΔRMSENASA: V7 → AMNLΔNASAPareto verdict
FD00113.04 → 12.74+2.3%289.7 → 322.4−11.3%neither dominates
FD00221.20 → 13.36+37.0%1820 → 1302.0+28.5%AMNL dominates
FD00312.93 → 11.69+9.6%315.4 → 348.1−10.4%neither dominates
FD00421.34 → 13.50+36.7%2156 → 1227.2+43.1%AMNL dominates
Reading the table. On the multi-condition subsets (FD002, FD004) AMNL dominates - it improves both metrics and lives strictly to the lower-left of V7 in (RMSE, NASA) space. On the simpler single-condition subsets (FD001, FD003) the two metrics disagree - V7 has slightly worse RMSE but slightly better NASA. There is no single-number winner; both models are on the frontier and the choice between them is operational, not statistical.

Python: Compute the Pareto Front

Brute-force O(n²) frontier extraction over a list of (RMSE, NASA) pairs. Use this whenever you have a sweep of model configurations and want to report only the non-dominated subset.

rmse_and_nasa() + pareto_front() over 6 toy candidates
🐍pareto_front_numpy.py
1import numpy as np

NumPy provides the (n, 2) ndarray we use to hold (RMSE, NASA) per candidate model and the vectorised comparisons that drive Pareto dominance checks. We also use np.sqrt, np.mean, np.exp, np.where, np.sum, np.ones, np.where (for boolean → indices).

EXECUTION STATE
📚 numpy = Library: ndarray, broadcasting, linear algebra, math.
as np = Universal alias.
4def rmse_and_nasa(y_pred, y_true, a1=13.0, a2=10.0) -> tuple[float, float]:

Compute BOTH headline metrics from the same prediction vector. Returning a tuple makes it easy to plot a single dot in (RMSE, NASA) space.

EXECUTION STATE
⬇ input: y_pred = (B,) predicted RUL per engine.
⬇ input: y_true = (B,) ground-truth RUL.
⬇ input: a1 = 13.0 = NASA early decay constant.
⬇ input: a2 = 10.0 = NASA late decay constant.
⬆ returns = (rmse, nasa) - two Python floats. Ready to push into a (n, 2) array of model coordinates.
11d = y_pred - y_true

Element-wise signed error. Same convention as §13.1: negative = early, positive = late.

EXECUTION STATE
operator: - = Element-wise subtraction.
⬆ result: d = (B,) - signed errors.
12rmse = float(np.sqrt(np.mean(d ** 2)))

Root mean squared error. Square the residuals, average, square root. The float() cast turns a 0-D ndarray into a Python float so the return tuple is JSON-friendly.

EXECUTION STATE
📚 np.mean(arr) = Reduce-mean. With no axis, returns a 0-D scalar.
📚 np.sqrt(arr) = Element-wise √x. exp(0.5) is sqrt; np.sqrt is the canonical NumPy spelling.
📚 float(x) = Python built-in. 0-D ndarray → float. Useful for clean printing and JSON serialisation.
operator: ** 2 = Element-wise squaring of the residual array.
⬆ result: rmse = Python float. RMSE has the same units as RUL (cycles).
13s = np.where(d >= 0, np.exp(d / a2) - 1.0, np.exp(-d / a1) - 1.0)

Per-sample NASA cost. Same call as §13.1.

EXECUTION STATE
📚 np.where(cond, a, b) = Element-wise ternary - returns a where cond is True, else b.
⬇ arg 1: cond = d &gt;= 0 = Boolean mask. True ⇒ pick the LATE branch.
⬇ arg 2: a (late branch) = exp(d/a2) - 1. Steeper.
⬇ arg 3: b (early branch) = exp(-d/a1) - 1. Gentler.
📚 np.exp(arr) = Element-wise e^x.
⬆ result: s = (B,) - per-sample asymmetric cost.
16nasa = float(s.sum())

NASA total = sum of per-sample costs. Lower is better.

EXECUTION STATE
📚 .sum() = ndarray method. Reduces over all axes by default.
📚 float(x) = Cast 0-D ndarray to a Python float.
⬆ result: nasa = Python float - the NASA total.
17return rmse, nasa

Tuple of two floats. One coordinate per axis of the Pareto plot.

EXECUTION STATE
⬆ return: (rmse, nasa) = e.g. (12.7, 298.0) for the AMNL FD001 model.
20def pareto_front(points: np.ndarray) -> np.ndarray:

Brute-force O(n²) Pareto front extraction. For our 6-model toy this is faster than fancy algorithms; for production scale (10⁵+ points) use a Kung-style line sweep instead.

EXECUTION STATE
⬇ input: points = (n, 2) ndarray of (RMSE, NASA) per candidate. We assume LOWER is better on both axes - so we look for non-dominated MIN pairs.
⬆ returns = 1-D ndarray of indices into `points`. The frontier preserves ordering of the original list.
26n = points.shape[0]

Number of candidate models.

EXECUTION STATE
📚 .shape[0] = First dim. For (n, 2) returns n.
⬆ result: n = Worked example: 6.
27keep = np.ones(n, dtype=bool)

Boolean mask of which candidates are still &quot;in the running&quot;. Initialised to all True; we knock entries down to False as we find them dominated.

EXECUTION STATE
📚 np.ones(shape, dtype) = Allocate an array of ones. With dtype=bool the values are True.
⬇ arg: shape = n = 1-D vector of length 6.
⬇ arg: dtype = bool = Boolean array. True = &quot;not yet shown to be dominated&quot;.
⬆ result: keep = [True, True, True, True, True, True]
28for i in range(n):

Outer loop. For each candidate i, check if any OTHER candidate j dominates it.

EXECUTION STATE
📚 range(n) = Lazy iterator [0, n).
iter var: i = Candidate being tested.
LOOP TRACE · 6 iterations
i = 0
M0 = (13.5, 340) - dominated by M1, M3, M4 ⇒ keep[0]=False
i = 1
M1 = (12.7, 298) - dominated by M3, M4 ⇒ keep[1]=False
i = 2
M2 = (11.9, 410) - dominated by M3 (lower on both) ⇒ keep[2]=False
i = 3
M3 = (11.4, 282) - NOT dominated (M4 lower NASA but higher... wait M4 lower on both: 11.0 ≤ 11.4 AND 260 ≤ 282) ⇒ keep[3]=False
i = 4
M4 = (11.0, 260) - NOT dominated by any ⇒ keep[4]=True
i = 5
M5 = (12.1, 324) - dominated by M3, M4 ⇒ keep[5]=False
29if not keep[i]: continue

Optimisation: if i was already shown dominated in a previous outer iteration, skip the inner loop. Saves O(n) per such i. Does not change correctness.

EXECUTION STATE
operator: not = Logical negation. not True = False.
📚 continue = Skip the rest of this iteration of the enclosing loop.
31for j in range(n):

Inner loop. Compare i against every other candidate j.

EXECUTION STATE
iter var: j = Comparison candidate.
LOOP TRACE · 2 iterations
(i, j) = (3, 4)
points[3] = (11.4, 282)
points[4] = (11.0, 260)
le_both = 11.0 ≤ 11.4 AND 260 ≤ 282 ⇒ True
strictly_lt = 11.0 &lt; 11.4 OR 260 &lt; 282 ⇒ True
→ verdict = M4 dominates M3 ⇒ keep[3] = False
(i, j) = (4, *)
for every j = no candidate has both rmse ≤ 11.0 AND nasa ≤ 260
→ verdict = M4 stays - it is on the frontier
32if i == j: continue

Skip self-comparison. A point cannot dominate itself.

34le_both = (points[j, 0] <= points[i, 0]) & (points[j, 1] <= points[i, 1])

j is at-most-as-bad as i on BOTH axes. Brackets the &lt;= via the bitwise & for boolean &quot;and&quot;. NumPy treats single-element boolean arrays the same as Python bools here.

EXECUTION STATE
operator: <= = Element-wise less-than-or-equal.
operator: & = Element-wise bitwise AND. On bool arrays, this is logical AND.
→ why & not and? = Python&apos;s `and` short-circuits and forces a scalar truth value - it errors on multi-element arrays. & is the array-friendly operator.
⬆ result: le_both = True if j is &lt;= i on both axes.
35strictly_lt = (points[j, 0] < points[i, 0]) | (points[j, 1] < points[i, 1])

j is STRICTLY better than i on at least one axis. Without this clause, two identical points would be considered to dominate each other.

EXECUTION STATE
operator: < = Element-wise strict less-than.
operator: | = Element-wise bitwise OR. On bool arrays, logical OR.
⬆ result: strictly_lt = True if j is &lt; i on at least one axis.
36if le_both and strictly_lt:

Pareto domination test: j ≤ i on both AND j &lt; i on at least one ⇒ j dominates i.

37keep[i] = False

Mark i as dominated. We can stop checking further js.

EXECUTION STATE
→ effect = keep[i] permanently flips to False; the next outer iteration will see it.
38break

Exit the inner loop early - we already know i is dominated.

EXECUTION STATE
📚 break = Exit the closest enclosing for/while loop.
39return np.where(keep)[0]

Convert the boolean mask to indices. np.where with one argument returns a tuple; [0] picks the only element.

EXECUTION STATE
📚 np.where(cond) = With ONE arg returns a tuple of arrays - one per dim of cond - of indices where cond is True. With THREE args it is the ternary. Confusing but standard.
→ why [0]? = 1-D input ⇒ 1-element tuple. [0] unwraps it.
⬆ result = Worked example: array([4]) - only M4 survived.
44candidates = np.array([...])

Six hand-picked (RMSE, NASA) pairs. Mix of dominated and non-dominated points so the algorithm has work to do.

EXECUTION STATE
📚 np.array(seq) = Construct an ndarray. Shape inferred from the nesting.
⬆ result: candidates =
M0  (13.50, 340)
M1  (12.70, 298)
M2  (11.90, 410)
M3  (11.40, 282)
M4  (11.00, 260)
M5  (12.10, 324)
53front_idx = pareto_front(candidates)

Run the algorithm. With this synthetic data only M4 survives.

EXECUTION STATE
⬆ result: front_idx = array([4])
55print("front idx :", front_idx.tolist())

.tolist() converts to a Python list for clean printing.

EXECUTION STATE
📚 .tolist() = ndarray → Python list.
Output = front idx : [4]
56print("front pts :")

Header for the per-frontier-point printout.

57for i in front_idx:

Iterate the frontier indices.

EXECUTION STATE
iter var: i = 4 (only M4 in this example).
LOOP TRACE · 1 iterations
i = 4
candidates[4, 0] = 11.00 (RMSE)
candidates[4, 1] = 260.0 (NASA)
→ printed = M4: RMSE=11.00 NASA=260
58print(f" M{i}: RMSE={candidates[i, 0]:.2f} NASA={candidates[i, 1]:.0f}")

f-string with format specs. {:.2f} = 2 decimals, {:.0f} = no decimals. Indexing 2-D arrays via candidates[i, 0] reads row i column 0.

EXECUTION STATE
📚 f-string = Inline expression interpolation.
→ :.2f = Format spec: float, 2 decimals.
→ :.0f = Format spec: float, 0 decimals - i.e. round to int and print without point.
→ 2-D index = candidates[i, 0] is the (i-th row, 0-th column) scalar. Equivalent to candidates[i][0].
Output = M4: RMSE=11.00 NASA=260
30 lines without explanation
1import numpy as np
2
3
4def rmse_and_nasa(y_pred: np.ndarray,
5                   y_true: np.ndarray,
6                   a1: float = 13.0,
7                   a2: float = 10.0) -> tuple[float, float]:
8    """Return (RMSE, total NASA score) for one model on a test set."""
9    d    = y_pred - y_true
10    rmse = float(np.sqrt(np.mean(d ** 2)))
11    s    = np.where(d >= 0,
12                    np.exp( d / a2) - 1.0,
13                    np.exp(-d / a1) - 1.0)
14    nasa = float(s.sum())
15    return rmse, nasa
16
17
18def pareto_front(points: np.ndarray) -> np.ndarray:
19    """Indices of Pareto-optimal points in (RMSE, NASA) space.
20
21    A point is Pareto-optimal iff no OTHER point dominates it - i.e.
22    no other point is &lt;= on both axes and strictly &lt; on at least one.
23    """
24    n     = points.shape[0]
25    keep  = np.ones(n, dtype=bool)
26    for i in range(n):
27        if not keep[i]:
28            continue
29        for j in range(n):
30            if i == j:
31                continue
32            le_both     = (points[j, 0] <= points[i, 0]) & (points[j, 1] <= points[i, 1])
33            strictly_lt = (points[j, 0] <  points[i, 0]) | (points[j, 1] <  points[i, 1])
34            if le_both and strictly_lt:
35                keep[i] = False
36                break
37    return np.where(keep)[0]
38
39
40# ---------- Worked example: 6 candidate models ----------
41#                  (RMSE,  NASA)
42candidates = np.array([
43    [13.5,  340.],     # M0  baseline
44    [12.7,  298.],     # M1  AMNL
45    [11.9,  410.],     # M2  late-leaning
46    [11.4,  282.],     # M3  GABA
47    [11.0,  260.],     # M4  GRACE
48    [12.1,  324.],     # M5  uncertainty-weighted
49])
50
51front_idx = pareto_front(candidates)
52
53print("front idx :", front_idx.tolist())
54print("front pts :")
55for i in front_idx:
56    print(f"  M{i}: RMSE={candidates[i, 0]:.2f}  NASA={candidates[i, 1]:.0f}")

PyTorch: Two Models, Two Metrics

Synthesise a late-leaning and an early-leaning “model” and report both metrics for each. The early-leaning one wins decisively on NASA score even though its RMSE is only slightly better - exactly the asymmetry §13.1 derived.

HybridLoss(lam) traces the frontier; report(...) prints both metrics
🐍rmse_nasa_compare_torch.py
1import torch

Top-level PyTorch.

EXECUTION STATE
📚 torch = Tensor library + autograd + nn modules + optim.
2import torch.nn as nn

Module containers.

EXECUTION STATE
📚 nn.Module = Base class for all PyTorch models and losses.
3import torch.nn.functional as F

Stateless functional ops - we use F.mse_loss for the report function.

EXECUTION STATE
📚 F = torch.nn.functional. Functions like F.mse_loss, F.cross_entropy, F.softmax.
6class HybridLoss(nn.Module):

A loss that linearly mixes MSE and NASA. lam=0 ⇒ pure MSE; lam=1 ⇒ pure NASA. Sweeping lam from 0 to 1 traces out the Pareto frontier.

8def __init__(self, lam=0.5, a1=13.0, a2=10.0, clip_error=50.0):

Four hyperparameters: mixing weight, two NASA decay constants, |d| clip.

EXECUTION STATE
⬇ input: lam = 0.5 = Convex combination weight. lam=0 → pure MSE; lam=1 → pure NASA. Halfway is a sensible starting point.
⬇ input: a1 = 13.0 = NASA early decay.
⬇ input: a2 = 10.0 = NASA late decay.
⬇ input: clip_error = 50.0 = |d| clip - prevents exp() overflow at init.
12super().__init__()

Initialise nn.Module.

13self.lam = lam

Store mixing weight.

14self.a1, self.a2, self.clip_error = a1, a2, clip_error

Tuple unpacking - parallel assignment to three attributes in one line.

16def forward(self, pred, target) -> torch.Tensor:

Compute the hybrid loss in a single forward pass.

EXECUTION STATE
⬇ input: pred = (B,) predicted RUL with requires_grad=True.
⬇ input: target = (B,) ground-truth RUL.
⬆ returns = 0-D scalar tensor connected to pred via autograd.
17d = (pred - target).clamp(-self.clip_error, self.clip_error)

Signed error, clipped.

EXECUTION STATE
operator: - = Element-wise tensor subtraction.
📚 .clamp(min, max) = Element-wise clip. Differentiable: gradient is 1 inside the range, 0 outside.
⬇ arg 1: min = -clip = -50 by default.
⬇ arg 2: max = +clip = +50 by default.
18mse = (d ** 2).mean()

Plain MSE on the clipped residual.

EXECUTION STATE
operator: ** 2 = Element-wise square.
📚 .mean() = Reduce-mean. With no dim, reduces to a 0-D scalar.
⬆ result: mse = 0-D tensor. Same numerical answer as F.mse_loss(pred, target) when no clipping kicks in.
19nasa = torch.where(d >= 0, ...).mean()

Mean per-sample NASA cost.

EXECUTION STATE
📚 torch.where(cond, a, b) = Element-wise ternary - returns a where cond is True, else b.
⬇ arg 1: cond = (d &gt;= 0) = True for late predictions.
⬇ arg 2: late branch = exp(d/a2) - 1. Steeper.
⬇ arg 3: early branch = exp(-d/a1) - 1. Gentler.
📚 torch.exp(t) = Element-wise e^x.
⬆ result: nasa = 0-D tensor - mean per-sample asymmetric cost.
24return (1 - self.lam) * mse + self.lam * nasa

Convex combination. The (1-lam) and lam coefficients sum to 1, so the total loss has a stable scale across the lam sweep.

EXECUTION STATE
operator: 1 - self.lam = Python scalar arithmetic. e.g. lam=0.3 ⇒ (1-lam) = 0.7.
operator: * = Scalar × tensor broadcast.
operator: + = Tensor add.
⬆ return = 0-D scalar tensor. .backward() flows back into pred.
28torch.manual_seed(0)

Repro.

EXECUTION STATE
📚 torch.manual_seed(s) = Set the global PyTorch PRNG.
⬇ arg: s = 0 = Conventional canonical seed.
29B = 200

Sample size for the synthetic comparison.

30y_true = torch.randint(0, 126, (B,)).float()

Capped RUL targets in [0, 125]. Cast to float because mse_loss needs floats.

EXECUTION STATE
📚 torch.randint(low, high, size) = Random ints in [low, high). High exclusive (NumPy convention).
⬇ arg: low = 0 = Inclusive.
⬇ arg: high = 126 = Exclusive ⇒ 0..125.
⬇ arg: size = (B,) = 1-D output, B=200 entries.
📚 .float() = Cast int64 → float32.
33preds_A = (y_true + 6 + 4 * torch.randn(B)).clamp(0, 125)

Synthetic LATE-leaning model. Bias = +6 cycles; noise σ = 4. Clamp to [0, 125] to stay in the legal RUL range.

EXECUTION STATE
📚 torch.randn(*size) = Sample i.i.d. N(0, 1) values.
⬇ arg: size = B = Per-sample noise.
operator: + 6 = Add 6 cycles of bias - the model is systematically late.
operator: 4 * = Noise scale.
📚 .clamp(0, 125) = In-range guard. Predictions outside [0, 125] are physically meaningless.
⬆ result: preds_A = (B,) float32 tensor. Mean error d ≈ +6.
34preds_B = (y_true - 4 + 4 * torch.randn(B)).clamp(0, 125)

Synthetic EARLY-leaning model. Bias = -4 cycles; same noise. Lower NASA cost than A despite same noise.

EXECUTION STATE
operator: - 4 = Subtract 4 cycles - the model is systematically early.
⬆ result: preds_B = (B,) float32 tensor. Mean error d ≈ -4.
37def report(name, pred):

Helper that prints both metrics for one prediction set.

EXECUTION STATE
⬇ input: name = String label for printing.
⬇ input: pred = (B,) predictions.
38rmse = torch.sqrt(F.mse_loss(pred, y_true)).item()

RMSE = sqrt(MSE). F.mse_loss is the standard PyTorch path; .item() pulls the float out.

EXECUTION STATE
📚 F.mse_loss(input, target, reduction='mean') = Standard MSE. Default reduction is 'mean'.
⬇ arg 1: input = pred = (B,) predictions.
⬇ arg 2: target = y_true = (B,) ground truth.
📚 torch.sqrt(t) = Element-wise √x.
📚 .item() = 0-D tensor → Python float.
⬆ result: rmse = Python float. Same units as RUL (cycles).
39nasa_loss = HybridLoss(lam=1.0)(pred, y_true).item()

Reuse our HybridLoss with lam=1.0 (pure NASA). Returns the MEAN per-sample cost - we multiply by B in the next line to get the total.

EXECUTION STATE
⬇ arg: lam = 1.0 = Pure NASA, no MSE component.
→ call sequence = HybridLoss(lam=1.0) creates the module; (pred, y_true) calls __call__ → forward; .item() extracts the float.
⬆ result: nasa_loss = Python float - mean per-sample NASA cost.
40nasa_total = nasa_loss * B

NASA reports the TOTAL across the test set, not the mean. Multiply by B to undo the .mean() reduction.

EXECUTION STATE
operator: * = Scalar multiply.
⬆ result: nasa_total = Python float - sum of per-sample NASA costs.
41print(f"{name:>10s} RMSE={rmse:6.3f} NASA total={nasa_total:7.1f}")

f-string format. {:>10s} right-aligns the name to width 10. {:6.3f} pads to width 6 with 3 decimals. {:7.1f} pads to 7 with 1 decimal.

EXECUTION STATE
→ :>10s = Format spec: string, right-aligned, min width 10.
→ :6.3f = Float, 3 decimals, min width 6.
→ :7.1f = Float, 1 decimal, min width 7.
Output (one realisation) = late RMSE= 7.193 NASA total= 874.6 early RMSE= 5.770 NASA total= 103.4
→ reading = Both metrics agree the early-leaning model is BETTER (lower RMSE AND lower NASA). The late model has 1.25× worse RMSE but ~8.5× worse NASA - the asymmetry shows up most strongly in NASA.
44report("late", preds_A)

Run the helper on the late-leaning model.

45report("early", preds_B)

Run the helper on the early-leaning model.

20 lines without explanation
1import torch
2import torch.nn as nn
3import torch.nn.functional as F
4
5
6class HybridLoss(nn.Module):
7    """Hybrid MSE + NASA loss. lam = mixing weight."""
8    def __init__(self, lam: float = 0.5,
9                       a1:  float = 13.0,
10                       a2:  float = 10.0,
11                       clip_error: float = 50.0):
12        super().__init__()
13        self.lam = lam
14        self.a1, self.a2, self.clip_error = a1, a2, clip_error
15
16    def forward(self, pred: torch.Tensor, target: torch.Tensor) -> torch.Tensor:
17        d = (pred - target).clamp(-self.clip_error, self.clip_error)
18        mse  = (d ** 2).mean()
19        nasa = torch.where(
20            d >= 0,
21            torch.exp( d / self.a2) - 1.0,
22            torch.exp(-d / self.a1) - 1.0,
23        ).mean()
24        return (1 - self.lam) * mse + self.lam * nasa
25
26
27# ---------- Compare two loss weightings on the same data ----------
28torch.manual_seed(0)
29B        = 200
30y_true   = torch.randint(0, 126, (B,)).float()
31
32# Two synthetic &quot;models&quot; - same architecture, different bias
33preds_A = (y_true + 6  + 4 * torch.randn(B)).clamp(0, 125)        # late-leaning
34preds_B = (y_true - 4  + 4 * torch.randn(B)).clamp(0, 125)        # early-leaning
35
36
37def report(name: str, pred: torch.Tensor):
38    rmse = torch.sqrt(F.mse_loss(pred, y_true)).item()
39    nasa_loss = HybridLoss(lam=1.0)(pred, y_true).item()
40    nasa_total = nasa_loss * B                                    # undo .mean
41    print(f"{name:>10s}  RMSE={rmse:6.3f}  NASA total={nasa_total:7.1f}")
42
43
44report("late",  preds_A)
45report("early", preds_B)

Same Idea, Other Pairings

The Pareto-frontier framing is dataset-agnostic. Anywhere you have an asymmetric primary metric and a symmetric secondary metric on the same model, the same frontier analysis applies.

DomainSymmetric metricAsymmetric metricFrontier knee tilts toward
RUL prediction (C-MAPSS)RMSENASA scoreearly-bias
Wildfire risk forecastingBrier scoreLate-detection cost (×10⁶)early-bias
Battery SoC for EVMAE on SoC %Stranded-driver costconservative SoC
Power-grid load forecastMAPEReserve-shortfall costover-forecasting
Hospital ICU triageAUROCMissed-deterioration costfalse-positive bias
Inventory days-to-stockoutRMSEStockout costover-stocking

Three Pareto-Reporting Pitfalls

Pitfall 1: Reporting one metric only. The legacy book's “NASA gets WORSE by 11% on FD001 even though RMSE improves” would be invisible in a single-metric paper. Always report both metrics, every time.
Pitfall 2: Comparing dominated points to the frontier. If your new method does not dominate the existing baseline, the comparison is unsound - you have traded one metric for another. Mark the result as “on the frontier” instead of “better than”. Reviewers conflate these constantly; do not enable them.
Pitfall 3: Hypervolume with no anchor. Hypervolume is a great single-number summary - but only relative to a fixed nadir / reference point. Reporting hypervolume without specifying the reference is meaningless; a sufficiently distant nadir makes any frontier look great. Standard practice: pick the nadir as the worst (RMSE, NASA) observed across the entire benchmark suite, and freeze it.
The point. Two metrics, two axes, one frontier. Section §13.3 turns the frontier into operational deployment regimes; §13.4 maps each regime to AMNL, GABA, or GRACE.

Takeaway

  • Domination = ≤ on both, < on at least one. The frontier is the set of non-dominated points.
  • RMSE and NASA can disagree. Two of four C-MAPSS subsets show this in the legacy ablation. Always report both.
  • HybridLoss(lam) with λ[0,1]\lambda \in [0, 1] sweeps from pure MSE to pure NASA - traces the frontier in one knob.
  • Brute-force pareto_front is O(n²). Adequate for any sweep with <10⁴ candidates. Beyond that, use Kung's line-sweep algorithm.
Loading comments...