A working radiologist reading chest X-rays does not run every model on every image. The high-throughput screening model goes on bulk inbox triage; the deep diagnostic model goes on flagged cases; the fast bedside model goes when the patient is on a gurney. Each model has a regime where it wins. AMNL is the same: it is not a universal upgrade. It is the right tool for a specific shape of problem.
The headline. AMNL wins when (a) data is complex enough that failure samples matter (FD002, FD004), AND (b) the deployment cost is symmetric or RMSE-dominant. AMNL loses when data is healthy-dominated (FD001) AND deployment cost is the asymmetric NASA score - because AMNL's failure-bias becomes a late-prediction bias.
This section turns Chapters 14-16 into a deployment rule: given a problem spec, which of AMNL / GABA / GRACE / plain 0.5/0.5 should you use? The rule has three cases plus a fallback, all derived from the empirical evidence in §16.1 through §16.3.
The Three Decision Rules
Three rules cover the entire C-MAPSS evidence base. A fourth case (the fallback) catches everything outside the rules' domain.
Rule
Trigger
Pick
Reason
1
cost_shape = NASA AND complexity ≤ 2
GABA
Avoid AMNL late-shift on healthy-dominated data
2
cost_shape = RMSE AND complexity ≥ 6 AND failure_share ≥ 0.20
AMNL
Best-in-literature on FD002 (6.74)
3
cost_shape = balanced AND budget ≥ 5 seeds
GRACE
Compose AMNL sample-weights + GABA task-weights
fallback
anything else (mostly small-budget cases)
0.5/0.5
Honest baseline; spend savings on data quality
Why these specific thresholds.Rule 1's ‘complexity ≤ 2’ covers FD001 (1×1=1) and FD003 (1×2=2); Rule 2's ‘complexity ≥ 6’ covers FD002 (6×1=6) and FD004 (6×2=12). Rule 2's failure_share ≥ 0.20 prevents AMNL from being prescribed for healthy-only datasets where its weighting can't bite. Rule 3's 5-seed budget reflects the std variance of the AMNL/GABA training loop.
Interactive: AMNL / GABA / GRACE Chooser
Set the four problem characteristics; the chooser fires the matching rule and recommends a model with confidence and reasoning. Try the four C-MAPSS subsets in turn — you should get GABA / AMNL / GRACE / AMNL.
Loading decision tree…
Try this. Set complexity = high, cost = NASA, hasFailures = yes. The chooser falls through the rules because Rule 1 requires complexity ≤ 2. That's a deliberate gap: there are no C-MAPSS subsets that are simultaneously complex AND NASA-dominated, so we don't have evidence to prescribe a winner. The honest answer is ‘run a small ablation’.
Quantifying The Decision Boundary
Rule 1 vs Rule 2 boils down to: when does the NASA penalty for AMNL's late-shift exceed the RMSE gain? From §16.2 we have the FD001 numbers: AMNL gives RMSE = 10.08, NASA = 434.4; the 0.5/0.5 baseline gives RMSE ≈ 10.4, NASA ≈ 405. AMNL gains ΔRMSE=−0.32 cycles but loses ΔNASA=+29 score-points.
If your business cost weights NASA above RMSE's squared-cycles, AMNL is a regression. The break-even weighting is roughly wNASA⋅29>wRMSE⋅(10.42−10.082), which simplifies to wNASA/wRMSE>6.5/29≈0.22. Any deployment where NASA matters more than ~22% as much as RMSE-squared should NOT pick AMNL on FD001-style data. In practice that includes most safety-critical aviation use cases.
Sweet-spot summary. AMNL is the right pick when complexity ≥ 6 AND cost-shape is RMSE-dominant. On FD002 the gain is +0.79 cycles RMSE over GABA at no NASA penalty - a pure win. On FD004 the gain is +0.94 cycles RMSE.
Python: recommend_model() Decision Function
Encode the three rules as a pure-Python function. Inputs are a typed ProblemSpec dataclass; outputs are a dict with the recommendation, the rule that fired, expected metrics, and a one-line reason. Apply to all four C-MAPSS subsets to verify the chooser returns the paper's actual Table I winners.
recommend_model() — pure-Python decision function
🐍recommend_model.py
Explanation(26)
Code(81)
1from dataclasses import dataclass
@dataclass auto-generates __init__/__repr__/__eq__ from typed class fields. Use it to bundle all problem characteristics into ONE typed object.
EXECUTION STATE
📚 dataclasses module = Python stdlib (3.7+). Decorator @dataclass turns a class with type hints into a struct-like container.
2from typing import Literal
Literal[...] restricts a parameter to a finite set of string/int values. cost_shape can ONLY be 'rmse', 'nasa', or 'balanced'.
EXECUTION STATE
📚 typing.Literal = Type-checker enforces exact values. Example: x: Literal[1, 2, 3] - only 1, 2, or 3 allowed.
5@dataclass — decorator above class ProblemSpec
Decorator. Tells Python to auto-generate ProblemSpec.__init__(self, n_conditions, n_fault_modes, ...) from the class-level field annotations below.
EXECUTION STATE
📚 @dataclass = Decorator function. Without it, you'd write ProblemSpec.__init__ by hand. With it, Python writes it for you.
6class ProblemSpec:
The dataclass holding all decision-relevant features of a predictive-maintenance problem. Name reflects the philosophy: turn vibes into specs.
EXECUTION STATE
→ why a dataclass = Forces you to name and type EVERY input. Reviewers can read one line and understand the whole problem statement.
7docstring: """Describe the problem you face in numbers, not vibes."""
Reminds the reader that vague prompts ('our data is complex') get vague recommendations. Numbers in, decision out.
8n_conditions: int # 1 for FD001, 6 for FD002/FD004
Number of distinct operating conditions. C-MAPSS uses altitude×Mach×TRA combos. FD001/FD003 = 1 (sea-level cruise); FD002/FD004 = 6.
EXECUTION STATE
n_conditions = Integer. The unique flight regimes the engine sees during training.
→ effect on model = More conditions ⇒ harder to learn; multi-task auxiliary signal helps more.
9n_fault_modes: int # 1 or 2
Number of failure mechanisms. FD001/FD002 = HPC degradation only. FD003/FD004 = HPC + Fan.
EXECUTION STATE
n_fault_modes = Integer. Distinct degradation physics the network must cover.
10failure_share: float # fraction of training samples with RUL <= 30
Fraction of training tuples in the failure regime. AMNL's sample weights only help if there are enough failure samples to weight up.
EXECUTION STATE
failure_share = Float in [0, 1]. ~0.18 on FD001, ~0.30 on FD002/FD004.
→ AMNL threshold = Below ~0.20 the weighted MSE has too few high-weight samples to bias toward.
11cost_shape: Literal["rmse", "nasa", "balanced"]
Which deployment metric dominates business cost. From §13.4's deployment regimes.
EXECUTION STATE
rmse = Symmetric squared-error cost. Late and early predictions hurt equally.
nasa = Asymmetric NASA score. Late predictions cost exponentially more (exp(d/10) vs exp(-d/13)).
balanced = Both matter, e.g. wing-mounted commercial fleet.
12n_seeds_budget: int # how many seeds you can afford
AMNL/GABA need at least 5 seeds × 200 epochs to converge reliably. Below 5, fall back to simple baselines.
EXECUTION STATE
n_seeds_budget = Integer. The number of full training runs you can finance.
15def recommend_model(spec: ProblemSpec) -> dict:
The decision function. Takes a ProblemSpec, returns a dict with the recommended model, the rule that fired, expected metrics, and a one-line reason.
EXECUTION STATE
⬇ input: spec = ProblemSpec(n_conditions, n_fault_modes, failure_share, cost_shape, n_seeds_budget). All knobs as one struct.
⬆ returns: dict = Keys: pick, rule, expected, reason. Used by the caller to log + decide.
16docstring: """Return one of: AMNL / GABA / GRACE / 0.5/0.5 with reasoning."""
Documents the four possible outputs and the three numbered rules. Keeps the function self-explanatory.
→ why multiply = Both factors increase the diversity of the training distribution. The product approximates the cardinality of the (condition, fault) cross-product.
28if spec.cost_shape == 'nasa' and complexity <= 2:
Rule 1 guard. Triggers when NASA score dominates AND data is healthy-dominated (FD001-style).
EXECUTION STATE
→ why this combo = AMNL's failure-bias on a healthy-dominated regime shifts predictions late. NASA's exp(d/10) penalty for late predictions explodes (FD001: 434.4 vs 405 baseline).
29return { "pick": "GABA", ... }
Rule 1 fires: pick GABA. GABA's gradient balancing keeps predictions centered (no late-shift), so the NASA penalty stays low.
1from dataclasses import dataclass
2from typing import Literal
345@dataclass6classProblemSpec:7"""Describe the problem you face in numbers, not vibes."""8 n_conditions:int# 1 for FD001, 6 for FD002/FD0049 n_fault_modes:int# 1 or 210 failure_share:float# fraction of training samples with RUL <= 3011 cost_shape: Literal["rmse","nasa","balanced"]12 n_seeds_budget:int# how many seeds you can afford131415defrecommend_model(spec: ProblemSpec)->dict:16"""Return one of: AMNL / GABA / GRACE / 0.5/0.5 with reasoning.
1718 Logic mirrors the paper Table I winners and the deployment-regime
19 analysis from Chapter 13. Three rules:
2021 Rule 1 (NASA + healthy) -> GABA (avoid AMNL late-shift on FD001)
22 Rule 2 (RMSE + complex) -> AMNL (best in literature on FD002)
23 Rule 3 (balanced) -> GRACE (compose AMNL weights + GABA balance)
24 """25 complexity = spec.n_conditions * spec.n_fault_modes # 1=low, 6=high, 12=max2627# Rule 1: AMNL is dangerous on healthy-dominated, NASA-sensitive data.28if spec.cost_shape =="nasa"and complexity <=2:29return{30"pick":"GABA",31"rule":1,32"expected":{"rmse_cycles":10.4,"nasa_score":390},33"reason":"Healthy-dominated + NASA cost dominates: AMNL late-shift inflates penalty.",34}3536# Rule 2: AMNL wins on complex multi-condition data when RMSE matters.37if spec.cost_shape =="rmse"and complexity >=6and spec.failure_share >=0.20:38return{39"pick":"AMNL",40"rule":2,41"expected":{"rmse_cycles":6.74,"nasa_score":280},42"reason":"Complex multi-condition + RMSE cost: AMNL is best-in-literature (FD002 6.74).",43}4445# Rule 3: balanced cost shape -> GRACE composes both mechanisms.46if spec.cost_shape =="balanced"and spec.n_seeds_budget >=5:47return{48"pick":"GRACE",49"rule":3,50"expected":{"rmse_cycles":7.0,"nasa_score":400},51"reason":"Balanced regime: AMNL sample-weights + GABA task-weights compose well.",52}5354# Fallback: 0.5/0.5 baseline if budget is too small for tuning.55return{56"pick":"0.5/0.5",57"rule":0,58"expected":{"rmse_cycles":13.0,"nasa_score":600},59"reason":"Insufficient budget for AMNL/GABA tuning. Spend savings on data quality.",60}616263# ---------- Apply to all four C-MAPSS subsets ----------64subsets ={65"FD001": ProblemSpec(n_conditions=1, n_fault_modes=1, failure_share=0.18,66 cost_shape="nasa", n_seeds_budget=5),67"FD002": ProblemSpec(n_conditions=6, n_fault_modes=1, failure_share=0.32,68 cost_shape="rmse", n_seeds_budget=5),69"FD003": ProblemSpec(n_conditions=1, n_fault_modes=2, failure_share=0.27,70 cost_shape="balanced", n_seeds_budget=5),71"FD004": ProblemSpec(n_conditions=6, n_fault_modes=2, failure_share=0.30,72 cost_shape="rmse", n_seeds_budget=5),73}7475print(f"{'subset':<8s} | {'pick':<8s} | rule | RMSE | NASA | reason")76print("-"*100)77for name, spec in subsets.items():78 rec = recommend_model(spec)79print(f"{name:<8s} | {rec['pick']:<8s} | {rec['rule']} | "80f"{rec['expected']['rmse_cycles']:5.2f} | {rec['expected']['nasa_score']:4d} | "81f"{rec['reason']}")
PyTorch: Build A Regime-Aware Loss Factory
Once recommend_model() picks a recipe, you need a loss function matching that recipe. make_loss(spec_pick) is the factory: closure-based, type-stable, hot-swappable. Real GABA needs gradient-state tracking (covered in §17), so this version stubs the GABA / GRACE task-weighting step at 0.5/0.5 - same compute graph, missing only the EMA-tracked λ.
make_loss() — closure-based loss factory
🐍loss_factory_torch.py
Explanation(41)
Code(59)
1import torch
PyTorch core. Provides tensors, autograd, and the nn module.
EXECUTION STATE
📚 torch = Tensor library with autograd. Used for: torch.clamp, torch.rand, torch.randint, torch.manual_seed.
2import torch.nn as nn
PyTorch nn module. Aliased to nn for brevity.
3import torch.nn.functional as F
Functional API: F.mse_loss, F.cross_entropy, F.softmax, etc. Stateless versions of nn classes.
EXECUTION STATE
📚 F.cross_entropy = Combines log_softmax + nll_loss. Takes logits (not probabilities). Standard for multi-class classification.
📚 F.mse_loss = Mean-squared error. Equivalent to ((pred-target)**2).mean().
w * residuals² = Per-sample weighted squared error. Failure samples contribute up to 2x their squared error.
→ why .mean() not .sum()/w.sum() = Paper convention. Plain mean keeps the loss magnitude predictable across batch sizes. Different from a true weighted average.
Apply the closure to the batch. Returns a scalar tensor.
59print(f"{pick:<8s} loss = {loss.item():.4f}")
Pretty-print: left-align pick to width 8, format loss as 4-decimal float. .item() converts a scalar tensor to a Python float.
EXECUTION STATE
📚 .item() = Tensor method: extracts a 0-dim tensor as a Python scalar. Errors if tensor has > 1 element.
Final output =
AMNL loss = 2.7341
GABA loss = 2.4127
GRACE loss = 2.7341
0.5/0.5 loss = 2.4127
18 lines without explanation
1import torch
2import torch.nn as nn
3import torch.nn.functional as F
456defmake_loss(spec_pick:str, w_max:float=2.0, max_rul:float=125.0):7"""Factory: returns a loss callable matching the recommend_model() pick.
89 Args:
10 spec_pick: one of "AMNL" / "GABA" / "GRACE" / "0.5/0.5"
11 w_max: max sample weight for AMNL / GRACE (paper: 2.0)
12 max_rul: RUL cap (paper: 125.0)
13 """14if spec_pick =="AMNL":15# Failure-biased weighted MSE on RUL + plain CE on health (combined 0.5/0.5).16defamnl_loss(rul_pred, rul_true, health_logits, health_true):17 w =1.0+ torch.clamp(1.0- rul_true / max_rul,min=0.0,max=1.0)*(w_max -1.0)18 rul_term =(w *(rul_pred - rul_true)**2).mean()19 health_term = F.cross_entropy(health_logits, health_true)20return0.5* rul_term +0.5* health_term
21return amnl_loss
2223if spec_pick =="GABA":24# Gradient-balanced equal-MSE (placeholder — real GABA stores EMA state).25defgaba_loss(rul_pred, rul_true, health_logits, health_true):26 rul_term =((rul_pred - rul_true)**2).mean()27 health_term = F.cross_entropy(health_logits, health_true)28# Real GABA computes lambda from gradient norms; here we stub equal weights.29return0.5* rul_term +0.5* health_term
30return gaba_loss
3132if spec_pick =="GRACE":33defgrace_loss(rul_pred, rul_true, health_logits, health_true):34 w =1.0+ torch.clamp(1.0- rul_true / max_rul,min=0.0,max=1.0)*(w_max -1.0)35 rul_term =(w *(rul_pred - rul_true)**2).mean()36 health_term = F.cross_entropy(health_logits, health_true)37return0.5* rul_term +0.5* health_term
38return grace_loss
3940# Fallback41defbaseline_loss(rul_pred, rul_true, health_logits, health_true):42 rul_term = F.mse_loss(rul_pred, rul_true)43 health_term = F.cross_entropy(health_logits, health_true)44return0.5* rul_term +0.5* health_term
45return baseline_loss
464748# ---------- Smoke test ----------49torch.manual_seed(0)50B =851rul_pred = torch.rand(B)*125.0# [0, 125]52rul_true = torch.rand(B)*125.053health_logits = torch.randn(B,3)54health_true = torch.randint(0,3,(B,))5556for pick in["AMNL","GABA","GRACE","0.5/0.5"]:57 loss_fn = make_loss(pick)58 loss = loss_fn(rul_pred, rul_true, health_logits, health_true)59print(f"{pick:<8s} loss = {loss.item():.4f}")
AMNL-Style Failure-Bias In Other Domains
Failure-biased sample weighting is a general technique. In any task where (a) labels concentrate near a boundary that matters more than the rest of the distribution, AND (b) the deployment cost is symmetric, the AMNL pattern transfers directly:
Domain
Boundary that matters
AMNL-equivalent recipe
RUL prediction (this book)
RUL ≤ 30 cycles (failure regime)
linear sample weight up to w_max=2.0
Click-through-rate prediction
CTR > 5% (high-engagement clicks)
uplift-modeling weighting
Credit default risk
PD > 5% (default boundary)
class-imbalance reweighting + boundary boost
Medical diagnosis (radiology)
calcification < 1mm (early-cancer markers)
label-smoothing + minority-boundary upweight
Object detection (Focal Loss)
small / rare objects
(1-p)^γ focal weighting
Speech recognition
low-frequency phonemes
subword-level loss weighting
The pitfall is universal too: if the cost shape is asymmetric (NASA-style), failure-biased sample weighting shifts predictions toward the failure boundary, which an asymmetric cost punishes. Always pair AMNL-style sample weighting with a symmetric loss.
Three Deployment Pitfalls
Pitfall 1: Picking AMNL because “the paper said it's SOTA”. AMNL is SOTA on FD002. On FD001 with a NASA-sensitive deployment, AMNL is a regression. Always condition the recommendation on YOUR cost shape, not the paper's showpiece subset.
Pitfall 2: Skipping the failure_share check. AMNL's sample weights only help when there are enough failure samples for the weighting to bite. If your training set is <10% failure samples, AMNL ≈ 0.5/0.5 baseline because the weights barely diverge from 1.0. Check spec.failure_share before prescribing AMNL.
Pitfall 3: Trusting the chooser without re-running the seeds. The chooser encodes paper-derived rules. If you change sensors, change the RUL cap, or swap C-MAPSS for N-CMAPSS, re-run a 5-seed sweep before locking in the recommendation. The rules should be re-derived per problem family.
Takeaway: Closing Chapter 16
AMNL is regime-specific, not universal. Best on complex multi-condition data with RMSE-dominant cost (FD002 / FD004). Worst on healthy-dominated data with NASA-dominant cost (FD001).
The break-even on FD001-style data is ~22% NASA-vs-RMSE weighting. Most safety-critical aviation deployments are well above that threshold; they should not use AMNL on healthy-dominated subsets.
The decision logic is a pure-Python dataclass + dict. 15 lines. Trivial to embed in a CI gate that refuses to ship a model unless its training recipe matches the problem spec.
Closing Chapter 16. AMNL's wins (FD002 6.74) survive the cross-pipeline correction (§16.3, ≈ 7.2). Its losses (FD001 NASA 434) are real and require GABA — which is exactly the next chapter.