Two People Speaking at Once
Try to follow a soft conversation while a brass band marches through the room. Most of the audio energy reaching your ears is the band; the words you actually want are buried under it. The instinctive fix is not to crank the volume — volume is regime-agnostic and lifts both. The fix is to cancel the band first, then listen.
That is exactly what multi-condition C-MAPSS does to a naive RUL model. The “band” is the operating regime: each cycle's sensor reading is dominated by whether the engine is at sea-level idle or at 35,000 ft cruise. The “words” are the slow, gradual degradation signal we actually want to predict. Throw both into the same Z-score normaliser and the model spends 98.9% of its capacity learning to recognise the band.
The Math: Variance Partitioning
Let one sensor's value be . Pool the entire training set and compute its variance. The classical decomposition (law of total variance) gives:
The first term — between-condition variance — is the variance of the cluster centroids around the global mean. It is regime energy: it tells you which of the six conditions the engine is in, nothing more. The second term — within-condition variance — is the average variance inside any one cluster. It contains the sensor noise and the slow degradation drift that we actually want to predict.
On real C-MAPSS FD002 data, this ratio is approximately — meaning 98.9% of every sensor's total variance is regime, 1.1% is the signal of interest. Per-condition normalisation throws away the 98.9% and lets the model spend its capacity on the remaining 1.1%.
Interactive: 99% of the Signal Is Regime Noise
Pick a sensor and toggle the normalisation scheme. The histogram is coloured by condition: under raw values you see six clearly separated peaks; under global Z-score you see the same six peaks, just rescaled; under per-condition Z-score the six peaks collapse onto each other and the residual variance becomes the within-condition variance — which is what we actually want the model to model.
The “between-condition” box on the right is the regime information. Global normalisation barely shrinks it. Per-condition normalisation drives it to zero by construction. Note also that the total variance is unchanged across the three modes (the histograms have the same spread); what changes is the partition.
Python: Quantify the Damage
Twenty lines of NumPy and the punch-line is unmissable: 98.9% of the raw sensor variance is regime, and per-condition Z-score eliminates exactly that fraction. Without this preprocessing step, even the fanciest downstream model is fighting a losing battle.
One number to remember
Roughly for typical C-MAPSS FD002 sensors. The exact ratio varies sensor by sensor (20% for some pressures, 99.9% for some temperatures), but the order-of-magnitude conclusion holds: regime dominates raw variance.
PyTorch: A Per-Condition Normaliser Module
For training we want the normaliser to live inside the model graph — so it moves to the GPU automatically, gets saved in state_dict, and is differentiable on its non-statistic inputs. Below is the class Chapter 6 promotes to a first-class citizen.
The Same Pattern in Other Domains
“Most of the variance is the regime, not the signal” is a central problem across statistics and machine learning. Each row below has its own canonical fix, and all the fixes structurally resemble per-condition normalisation.
| Domain | Regime | Fix | Effect |
|---|---|---|---|
| Multi-condition prognostics (this book) | Operating condition | Per-condition Z-score | Removes 99% regime variance |
| Speaker recognition | Speaker identity | Mean / std normalisation per speaker | Speech content stays, vocal tract gone |
| Multi-site neuroimaging | Scanner vendor / site | ComBat harmonisation | Removes site bias before downstream analysis |
| Recommender systems | User | Per-user mean centring | Item bias separated from user bias |
| Genomics RNA-seq | Batch | RUVseq / SVA | Removes batch effects |
| Federated learning | Client | Local BatchNorm or FedBN | Per-client statistics, shared parameters |
| EEG seizure detection | Patient | Patient-specific baseline | Idiosyncratic signal preserved |
The idea is universal: when the variance you don't care about is structured (clusters in some discrete variable), centre and scale within each cluster.
The Three Pitfalls
The whole story of Chapter 2 in one sentence. C-MAPSS ships with regime structure baked in; ignoring it costs two orders of magnitude in attention; per-condition normalisation removes it for free.
Takeaway
- The law of total variance partitions sensor variance into between- and within-condition. The first is regime; the second is signal.
- On C-MAPSS FD002, regime is ~99% of the raw variance. The degradation signal we actually want to predict is the remaining ~1%.
- Global Z-score does NOT fix this. It rescales, but the partition is preserved. Only per-condition Z-score eliminates regime variance.
- Implementation is one nn.Module with two buffers. Means and stds per condition; gather with advanced indexing; subtract and divide.
- This is the bridge to the rest of the book. Chapter 6 formalises the per-condition normaliser into the data pipeline; every model in Parts V-VII assumes its inputs are already condition-normalised.