§13.3 picked the regime; this section says what to TRAIN. The DualTaskModel of §11.4 stays the same in all three cases; only the loss function changes. The book's three methods - AMNL, GABA, and GRACE - sit at three different points along the (RMSE, NASA) Pareto frontier.
The mapping in one line. AMNL emphasises sample-level conservatism (truck regime). GABA balances per-task gradients on the shared trunk (airline regime). GRACE does both (cruise regime). All three preserve the §11.4 architecture and weight count; only the loss differs.
Regime
Method
Core mechanism
Best on (FD002+FD004)
delivery-truck
AMNL
failure-biased sample weighting
RMSE 7.45
airline-787
GABA
inverse-gradient task weighting
balanced (RMSE 7.89, NASA 235.7)
cruise-ship
GRACE
AMNL sample × GABA task
NASA 232.7
AMNL — Failure-Biased Weighted MSE
AMNL weights each SAMPLE by how close to failure it is. Healthy engines (RUL near 125) get the floor weight wmin=1; near-failure engines (RUL near 0) get the ceiling weight wmax=2. Linear schedule:
wi=wmax−(wmax−wmin)⋅Rmaxmin(yi,Rmax).
The total loss is LAMNL=(1−λhs)⋅wMSE+λhs⋅CE with fixed task weight λhs=0.5. Chapter 14 derives the schedule, Chapter 15 trains it.
What AMNL fixes. Plain MSE treats a 5-cycle residual at y=120 (irrelevant) the same as a 5-cycle residual at y=2 (life-or-death). AMNL doubles the second one's gradient. The shared backbone is forced to spend MORE capacity on accurate predictions near failure - exactly when accuracy matters.
GABA — Inverse-Gradient Adaptive Weighting
GABA computes per-task gradient norms on the shared backbone each step (using the §12.1 helper) and sets the task weights inversely proportional to those norms:
λt=∑t′1/∥gt′∥1/∥gt∥
Then the shared-backbone gradient is
gshared=∑tλt⋅gt
which has the property ∥λrul⋅grul∥=∥λhs⋅ghs∥ - both tasks pull the rope with EQUAL force regardless of their raw gradient magnitudes. The 500× imbalance from §12 is cancelled at the source. Chapters 17-19 derive and train it.
GRACE — Combine Both
GRACE applies AMNL's sample weighting on the regression branch AND GABA's task weighting at the loss combiner:
LGRACE=λrul⋅wMSEAMNL+λhs⋅CE,λt∝1/∥gt∥
The two mechanisms are complementary. AMNL's sample weighting addresses the WITHIN-task asymmetry (near-failure samples matter more); GABA's task weighting addresses the BETWEEN-task imbalance (RUL gradient dominates HS). Chapters 20-23 derive and train GRACE.
Interactive: Pick a Regime, See the Method
The same chooser as §13.3 - now the highlighted “winner” is also the method we recommend for that regime.
Loading deployment regime chooser…
Reading the chart. Slide w into the truck-regime band (0.00-0.15) and AMNL is highlighted - it wins on pure RMSE because failure-biased sample weighting gives the model the most-relevant information. Slide to the airline band (0.15-0.40) and GABA appears - inverse-gradient weighting gets the best balance. Slide to the cruise band (0.40-1.00) and GRACE wins - because it has both mechanisms.
Python: Three Losses Side by Side
All three loss functions in pure NumPy, sharing utilities. The worked example uses the §12.1 measured gradient norms (4.81 and 0.0096) so you can read the numerical effect of GABA directly off the printed output.
loss_amnl, loss_gaba, loss_grace - one file
🐍three_losses_numpy.py
Explanation(35)
Code(81)
1import numpy as np
NumPy provides the (B,) and (B, K) ndarrays we use to express each loss in vector form. We rely on np.minimum, np.maximum, np.exp, np.log, np.sum, np.mean, np.array, and broadcasting.
EXECUTION STATE
📚 numpy = Library: ndarray + linear algebra + math.
AMNL = failure-biased weighted MSE on the RUL branch, plain CE on the classification branch, fixed equal task weights. Best for the delivery-truck regime where RMSE matters more than late-bias.
Inverse-gradient weights. The +eps inside the denominator (not just outside) is the safe pattern - it keeps the gradient finite even at exactly-zero norm.
EXECUTION STATE
📚 np.array(seq) = Construct an ndarray.
→ numerical example = 1/(4.81 + 1e-8) ≈ 0.2079 (RUL); 1/(0.0096 + 1e-8) ≈ 104.17 (HS). HS's inverse is ~500× bigger - exactly the imbalance ratio from §12.1, now used to AMPLIFY the suppressed task.
⬆ result: inv = [0.2079, 104.17]
35w = 2.0 * inv / inv.sum()
Re-normalise so weights sum to 2 (matches the AMNL 0.5+0.5=1 budget AFTER summing the two halves; pick whichever convention you prefer, both are equivalent up to a global scale).
EXECUTION STATE
📚 .sum() = Reduce-sum. Here over a (2,) array = 0.2079 + 104.17 ≈ 104.38.
operator: * = Scalar × array broadcast.
operator: / = Element-wise array / scalar.
⬆ result: w = [0.00398, 1.99602] - HS gets ~500× the weight, RUL gets < 1% of the budget.
→ effect = Despite the dominant RUL gradient, the EFFECTIVE pull on the shared backbone is now equal: ‖g_rul‖ × 0.00398 ≈ ‖g_hs‖ × 1.99602.
36return float(w[0] * L_rul + w[1] * L_hs)
Linear combination of the two task losses with the inverse-gradient weights.
Run GABA with toy scalar losses (25.0, 1.10) and the measured grad norms.
EXECUTION STATE
Output = GABA : 2.2954 (≈ 0.00398 · 25 + 1.99602 · 1.10) - the dominant RUL term is now nearly invisible
70print("GRACE :", round(loss_grace(...), 4))
Run GRACE with the same data.
EXECUTION STATE
Output = GRACE : 2.2954 (in this synthetic example AMNL's sample weighting is a no-op because residuals are constant; the GABA combiner does the heavy lifting)
→ on real data = GRACE's sample weighting matters most when residuals VARY across the batch (which they do in C-MAPSS). The two mechanisms are complementary: GABA balances the TASKS, AMNL balances the SAMPLES within each task.
Three nn.Modules with identical call signatures (modulo the two extra grad-norm arguments for GABA / GRACE). Wire them into the §11.4 training loop unchanged.
→ at lam_hs = 0.5 = 0.5 · L_rul + 0.5 · L_hs - the AMNL default.
20class GABA(nn.Module):
GABA as an nn.Module that takes the SCALAR per-task losses plus the SCALAR per-task gradient norms and returns the inverse-gradient weighted sum. Caller measures grad norms via §12.1's helper.
22def __init__(self, eps: float = 1e-8):
One hyperparameter - numerical floor in the inverse.
EXECUTION STATE
⬇ input: eps = Floor inside the inverse to prevent divide-by-zero. 1e-8 is conventional.
Two-element tensor of inverse-gradient weights. Note the grad norms come in as Python floats - so this construction does NOT participate in autograd. That is intentional: we treat the gradient-norm measurement as a constant for the optimiser.
EXECUTION STATE
📚 torch.tensor(seq) = Construct a new tensor from a Python sequence. Default float32.
→ why no autograd here? = If we let inv depend on the loss tensors' gradients we would create a circular dependency. The standard trick is to compute inv via .detach() on the gradient norms (or pre-extract floats, as we do here).
29w = 2.0 * inv / inv.sum()
Normalise so weights sum to 2.
EXECUTION STATE
📚 .sum() = Reduce-sum.
30return w[0] * L_rul + w[1] * L_hs
Linear combination.
EXECUTION STATE
→ indexing = w[0] and w[1] are 0-D tensors. The product with L_rul/L_hs (which carry autograd) keeps the graph alive.
33class GRACE(nn.Module):
GRACE composes AMNL's sample weighting with GABA's task weighting.
AMNL sample-weight schedule. We re-derive it in this method rather than re-using self.amnl_part.forward() because we need the sample-weighted L_rul standalone.
IEEE/CAA JAS 2025 paper, Section V (combined results)
Chapter 20-23 of this book
AMNL
battery aging (capacity fade)
Severson et al. SoH studies
Adaptable - swap c_in
GABA
object detection (bbox + class)
GradNorm and follow-ups (Chen et al.)
GradNorm is a near-relative
GRACE
wind-turbine SCADA (RUL + fault type)
Industrial pilots (NREL, Vestas)
Reuse this book's implementation
Three Method-Selection Pitfalls
Pitfall 1: Picking GRACE by default. GRACE is the most expressive but also the most expensive (two extra backward passes per step for the grad-norm measurement). On the truck regime AMNL alone is faster AND wins - GRACE's machinery is wasted. Match the method to the regime, not to the headline result.
Pitfall 2: Forgetting the .detach() on grad norms. GABA's lambda values must be computed from grad norms WITHOUT autograd connecting them to the loss tensor. Forget the detach and you get a self-referential graph that either crashes (gradient with respect to a gradient) or silently trains the wrong thing. The book's Chapter 18 spells out the exact safe pattern.
Pitfall 3: Mixing AMNL's lam_hs with GABA's lambda. AMNL fixes lam_hs=0.5; GABA computes lam_hs per step from grad norms. Setting both is redundant and produces nonsensical weighting. GRACE deliberately disables AMNL's task-mixing branch (lam_hs=0.0 in the GRACE constructor) so only GABA decides the task weights.
The point. Three methods, three regimes, one architecture. Part V (Chapters 14-16) covers AMNL in depth; Part VI (Chapters 17-19) covers GABA; Part VII (Chapters 20-23) covers GRACE. Skip ahead to whichever your operational regime demands - or read in order for the full story.
Takeaway — End of Part IV
AMNL. Failure-biased sample weights. Best for RMSE-dominated regimes.
GABA. Inverse-gradient task weights. Best for balanced regimes.
GRACE. AMNL's sample weighting + GABA's task weighting. Best for safety-dominated regimes.
Same architecture. §11.4 DualTaskModel unchanged. Only the loss module differs.
Match method to regime. GRACE is not always the answer - the truck regime really is best served by AMNL.
End of Part IV. Diagnostic chapters done. Chapters 14-23 derive and train each of the three methods in turn.