Putting Eight Sections in One Script
Sections §14.1-§15.4 each isolated one piece of the AMNL pipeline. This section stitches them all together into a single end-to-end training script - the same code shape the paper ships in paper_ieee_tii/experiments/train_amnl_v7.py. Twelve stages, <200 lines.
train_amnl(dataset_name, epochs, lr) takes three knobs; everything else is paper-canonical default. Run it once per C-MAPSS subset to reproduce the paper's Table II AMNL row.Interactive: 12-Stage Pipeline
Click each stage to see its paper-file location and the book section that derives it. Setup runs once; the epoch + eval block runs 200 times.
Read the colour groups. Blue = data / model setup (one-time). Green = loss stack (one-time). Amber = optimiser stack (one-time). Pink = per-epoch train. Purple = per-epoch eval + scheduling. The whole training run is just these 12 stages cycling for 200 iterations.
Paper Files at a Glance
| Stage | Paper file | Book section |
|---|---|---|
| 1-2. Data + DataLoader | grace/data/cmapss_dataset.py | §7 |
| 3. DualTaskModel | grace/models/dual_task_model.py | §11.4 |
| 4. AMNL RUL loss | grace/core/weighted_mse.py | §14.1 |
| 5. FixedWeightLoss combiner | grace/core/baselines.py:34-49 | §15.1 |
| 6. AdamW + ReduceLROnPlateau | experiments/train_amnl_v7.py:480-496 | §15.2 |
| 7. EMA + clip_grad | grace/training/callbacks.py:54-87 + utils | §15.3 |
| 8-12. Training loop driver | grace/training/trainer.py:126-217 | §14.4 + §15.5 |
cd paper_ieee_tii && python experiments/train_amnl_v7.py --dataset FD002 --seed 0 invokes exactly the pipeline below. The paper's Table II numbers come from running this for 5 seeds (0-4) per dataset and averaging.Python: Pseudo-Pipeline
Conceptual NumPy walkthrough. Each call wraps a stub or a paper-imported component. The pseudo-output at the bottom shows what the real run would print on FD002.
PyTorch: Paper Trainer Driver
Production version. Imports every paper-canonical piece by name and runs them in the trainer's exact order. The per-epoch print emits the same shape the paper trainer logs.
Drop-In for Other PHM Domains
The 12-stage pipeline transfers wherever you have (a) a sliding-window time-series dataset, (b) a primary regression target, and (c) an auxiliary classification target. Swap the dataset and the c_in; everything else is unchanged.
| Domain | Dataset class | c_in | max_rul | Other changes |
|---|---|---|---|---|
| RUL prediction (this book) | CMAPSSFullDataset | 14 | 125 | none |
| N-CMAPSS DS02 | NCMAPSSDataset | 20 | 100 | model_configs.ncmapss_20feat |
| Battery SoH + fault type | BatteryDataset | 5 | 1.0 | max_rul=1.0 in moderate_weighted_mse_loss |
| Wind-turbine SCADA | SCADADataset | 12 | 720 | longer windows (T=144) |
| MRI tumour growth + benign/malignant | MRIFollowupDataset | vol | 20 | regression on volume, BCE for binary classification |
| Disk RUL + SMART anomaly type | BackblazeDataset | 16 | 180 | daily windows instead of cycles |
Three End-to-End Pitfalls
scheduler.step(val) during warmup (where lr is being externally set), the scheduler's internal ‘best’ gets corrupted and the first post-warmup cut fires too early. ALWAYS guard with if epoch >= warmup_epochs:.apply_shadow for val and forgetrestore, training continues with the SHADOW weights. Updates accumulate in the wrong place; the next eval sees corrupted shadow values. Plausible-looking loss curves; irreproducible runs.evaluate_model helper internally callsmodel.eval(). If the trainer doesn't re-enter model.train() at the start of the next epoch, dropout stays off and AMNL's sample-weighted regularisation is silently broken. Paper trainer puts model.train() at the top of every_train_epoch call.The point. Twelve stages, 100 lines, reproduces the paper's AMNL row of Table II. Every component lives inpaper_ieee_tii/grace/; the driver lives inexperiments/train_amnl_v7.py. Chapter 16 reports what those numbers actually look like at convergence.
Takeaway — End of Chapter 15
- Twelve stages. 7 setup, 5 per-epoch. Wired together by paper_ieee_tii/grace/training/trainer.py.
- Three knobs.
train_amnl(dataset_name, epochs, lr); everything else is paper-canonical default. - Order matters. warmup → train → eval(EMA) → restore → scheduler → track best.
- ~13.36 RMSE on FD002+FD004 avg. Paper Table II row. Reproducible with seed=0.
- End of Chapter 15. Chapter 16 reports empirical results across all four C-MAPSS subsets.