AI BookLearn by Building

Sign In Start Learning

Sign In Start Learning

Book · Advanced · 70+ hours

Gradient-Aware Multi-Task Learning for Predictive Maintenance

AMNL, GABA, and GRACE for RUL Prediction Under Multi-Condition Degradation

A research-grade walkthrough of three multi-task learning strategies for Remaining Useful Life prediction — AMNL (accuracy-first), GABA (safety-first), and GRACE (balanced) — all built on a shared CNN-BiLSTM-Attention backbone. Validated across 335 experiments on NASA C-MAPSS and N-CMAPSS DS02, beating the published SOTA (DKAMFormer) on multi-condition data.

29Chapters

121Sections

28hReading

10Parts

Start chapter 01 Browse curriculum

Part IFoundations17 Part IIData Pipeline13 Part IIIShared Backbone16 Part IVThe Core Discovery8 Part VModel 1: AMNL13 Part VIModel 2: GABA16 Part VIIModel 3: GRACE11 Part VIIIBaselines & SOTA13 Part IXAblations & Statistics9 Part XProduction5

Part I·4 chapters · 17 sections

Foundations— RUL, benchmarks, math preliminaries, and MTL theory.

Predictive Maintenance & RUL

Why Remaining Useful Life prediction matters, the cost of being late, and the three deployment regimes that motivate this book.

4 sections44 min read

Benchmarks: C-MAPSS & N-CMAPSS

The two benchmarks that define progress in turbofan RUL prediction, and why multi-condition data is the real challenge.

4 sections54 min read

Mathematical Preliminaries

The minimal math you need: time series tensors, 1D convolution, recurrent networks, attention, and softmax cross-entropy.

5 sections72 min read

Multi-Task Learning Theory

Shared backbones, task-specific heads, and the loss-combination problem that the rest of the book is dedicated to solving.

4 sections54 min read

Part II·3 chapters · 13 sections

Data Pipeline— C-MAPSS / N-CMAPSS, per-condition normalization, sequences and labels.

NASA Datasets Deep Dive

Sensor catalog, operating conditions, fault modes, and the file formats you will actually load into PyTorch.

5 sections66 min read

Per-Condition Normalization

The silent hero of the framework: why global Z-score fails on multi-condition data and how per-condition normalization fixes it.

4 sections49 min read

Sequences, RUL Cap & Health Labels

Building the (B, 30, 17) input tensor: sliding windows, the piecewise-linear RUL cap, three-class health labels, and a reusable PyTorch Dataset.

4 sections55 min read

Part III·4 chapters · 16 sections

Shared Backbone— CNN, BiLSTM, Multi-Head Attention, dual-task heads.

CNN Feature Extractor

Three 1D conv layers (17 → 64 → 128 → 64) extract local degradation patterns from sensor streams.

4 sections54 min read

Bidirectional LSTM Encoder

Two-layer BiLSTM (h=256) captures long-range temporal dependencies in degradation signatures.

4 sections60 min read

Multi-Head Self-Attention

Eight-head self-attention with residual connection lets the model focus on degradation-relevant timesteps.

4 sections57 min read

Dual-Task Heads & Model Assembly

Two task-specific heads (RUL regression + 3-class health classification) on top of a shared 32-d feature, totaling ~3.5 M parameters.

4 sections52 min read

Part IV·2 chapters · 8 sections

The Core Discovery— The 500x gradient imbalance and the accuracy-safety tradeoff.

The 500x Gradient Imbalance

The empirical discovery that motivates the rest of the book: regression gradients exceed classification gradients by 500x on shared parameters.

4 sections60 min read

The Accuracy-Safety Tradeoff

Why low RMSE coincides with high NASA score, and why the tradeoff cannot be hidden behind a single metric.

4 sections54 min read

Part V·3 chapters · 13 sections

Model 1: AMNL— Failure-biased weighted MSE for accuracy-first deployment.

Failure-Biased Weighted MSE

Up-weighting near-failure samples so the regressor pays attention where errors hurt the most.

4 sections51 min read

AMNL Training Pipeline

Fixed 0.5/0.5 task weighting + failure-biased MSE, with the optimizer, scheduler, and EMA tricks that hold it together.

5 sections74 min read

AMNL Results & When to Use It

Best-in-literature RMSE on FD002/FD003, the FD001 NASA penalty, and the cross-pipeline caveat you must report.

4 sections49 min read

Part VI·4 chapters · 16 sections

Model 2: GABA— Inverse-gradient adaptive balancing for safety-first deployment.

Inverse-Gradient Balancing: The Idea

Equalize each task's contribution to the shared backbone by giving lower weight to whichever task has bigger gradients.

4 sections54 min read

The GABA Algorithm

Per-step gradient norms, EMA smoothing (β = 0.99), minimum floor (λ_min = 0.05), and a 100-step warmup — the full pseudocode walked end to end.

5 sections74 min read

Control-Theoretic Interpretation

GABA viewed as a proportional feedback controller with an IIR filter and anti-windup floor — the property that gives it stronger stability guarantees than GradNorm.

3 sections39 min read

Training GABA & Results

GABA + standard MSE: best NASA among adaptive methods, no auxiliary loss, no learned parameters, and a single λ that converges within 10 epochs.

4 sections52 min read

Part VII·3 chapters · 11 sections

Model 3: GRACE— GABA + weighted MSE for balanced deployment.

Combining GABA + Weighted MSE

Adaptive weighting and loss-shape are orthogonal — GRACE composes them and resolves the accuracy-safety tradeoff.

3 sections39 min read

GRACE Training Pipeline

The full reproducible pipeline: 5 seeds, AdamW, ReduceLROnPlateau, EMA, gradient clipping, and exact hyperparameters.

4 sections64 min read

GRACE Results & the Pareto Frontier

Best NASA on multi-condition C-MAPSS, the RMSE-NASA Pareto picture, and the only method to win on N-CMAPSS DS02.

4 sections55 min read

Part VIII·3 chapters · 13 sections

Baselines & SOTA— Adaptive baselines, gradient surgery, and the unified comparison.

Adaptive MTL Baselines

Fixed weighting, Homoscedastic Uncertainty, GradNorm, and DWA — the published baselines we ran inside the same framework.

5 sections73 min read

Gradient-Surgery Baselines

PCGrad and CAGrad project away conflicting gradient directions. Useful, but magnitude correction beats direction correction in this domain.

3 sections48 min read

Unified Comparison vs. Published SOTA

AMNL, GABA, GRACE plus six baselines vs. DKAMFormer, DMHA-ATCN, STAR, DVGTformer, and the rest of the field.

5 sections81 min read

Part IX·2 chapters · 9 sections

Ablations & Statistics— Architecture, normalization, robustness, and statistical tests.

Architecture Ablation

Five backbones (CNN-only, MLP, Transformer, LSTM-only, Full) and the result that the framework benefit is architecture-agnostic.

4 sections52 min read

Normalization, Robustness & Statistical Tests

Per-condition vs. global normalization, GABA hyperparameter sweeps, and the formal Friedman / Wilcoxon tests behind every claim.

5 sections75 min read

Part X·1 chapter · 5 sections

Production— Edge deployment, model selection, and future directions.

Deployment, Model Selection & Future Directions

Edge deployment profile, ONNX export, the model-selection decision tree, and where this research goes next.

5 sections75 min read

The capstone

Where the book lands in practice.

Chapter 14·4 sections

Failure-Biased Weighted MSE

Up-weighting near-failure samples so the regressor pays attention where errors hurt the most.

Chapter 15·5 sections

AMNL Training Pipeline

Fixed 0.5/0.5 task weighting + failure-biased MSE, with the optimizer, scheduler, and EMA tricks that hold it together.

Chapter 16·4 sections

AMNL Results & When to Use It

Best-in-literature RMSE on FD002/FD003, the FD001 NASA penalty, and the cross-pipeline caveat you must report.

Chapter 17·4 sections

Inverse-Gradient Balancing: The Idea

Equalize each task's contribution to the shared backbone by giving lower weight to whichever task has bigger gradients.

Chapter 18·5 sections

The GABA Algorithm

Per-step gradient norms, EMA smoothing (β = 0.99), minimum floor (λ_min = 0.05), and a 100-step warmup — the full pseudocode walked end to end.

Chapter 19·3 sections

Control-Theoretic Interpretation

GABA viewed as a proportional feedback controller with an IIR filter and anti-windup floor — the property that gives it stronger stability guarantees than GradNorm.

Chapter 20·4 sections

Training GABA & Results

GABA + standard MSE: best NASA among adaptive methods, no auxiliary loss, no learned parameters, and a single λ that converges within 10 epochs.

Chapter 21·3 sections

Combining GABA + Weighted MSE

Adaptive weighting and loss-shape are orthogonal — GRACE composes them and resolves the accuracy-safety tradeoff.

Chapter 22·4 sections

GRACE Training Pipeline

The full reproducible pipeline: 5 seeds, AdamW, ReduceLROnPlateau, EMA, gradient clipping, and exact hyperparameters.

Chapter 23·4 sections

GRACE Results & the Pareto Frontier

Best NASA on multi-condition C-MAPSS, the RMSE-NASA Pareto picture, and the only method to win on N-CMAPSS DS02.

121 sections. Begin with one.

Chapter 1 — Predictive Maintenance & RUL — is where every reader starts.

Start chapter 01 All books