Of the 21 raw sensors, 7 are constants on FD001 (control commands like “demanded fan speed”) and contribute zero signal. Section 5.2 listed them; this section formalises the selection criteria and writes the code to drop them automatically. The result is the canonical F=14 input dimension that nearly every published C-MAPSS paper uses.
The principle. A feature is informative for RUL if (a) it varies, AND (b) its variation correlates with degradation. Either filter alone misses cases the other catches.
On FD001 the variance filter alone recovers the canonical 14 sensors; the correlation filter is a defensive safety net that catches cases like FD002 where some op-setting-coupled sensors accidentally vary across regimes without being meaningful for RUL.
Two-stage feature selector. Drops constants (zero variance) AND sensors that barely correlate with RUL.
EXECUTION STATE
input: df = DataFrame with the 21 sensor columns + a 'RUL' column
input: var_threshold = 1e-6 - any std below this is treated as constant
input: corr_threshold = 0.05 - drop sensors whose |corr(sensor, RUL)| is smaller
returns = List of surviving column names
18keep = df[SENSOR_COLS].std() > var_threshold
Boolean Series: True where std is above threshold. Sensor that didn't move is False.
EXECUTION STATE
Why threshold > 0? = Floating-point noise can make a 'constant' sensor's std be 1e-10 instead of exactly 0. Threshold of 1e-6 is robust.
19survivors = keep[keep].index.tolist()
Boolean indexing gymnastic: keep[keep] selects only the True entries; .index.tolist() yields the column names.
EXECUTION STATE
survivors after filter 1 = 14 sensors on FD001 (after dropping the 7 constants)
22if "RUL" in df.columns and survivors:
Filter 2 only runs if a RUL column exists (i.e., we are on the train set, not test).
23corrs = df[survivors].corrwith(df["RUL"]).abs()
.corrwith computes the Pearson correlation coefficient between each column of df[survivors] and the RUL column. We take absolute value because either positive or negative correlation is informative.
EXECUTION STATE
.corrwith(other) = Pandas method: pairwise correlation with another Series. Returns a Series indexed by column name.
Keep only sensors whose absolute correlation with RUL exceeds the threshold. On FD001 this filter removes 0 sensors - all 14 variance-survivors are also correlated.
EXECUTION STATE
survivors after filter 2 = 14 sensors (on FD001 - the variance filter already did the heavy lifting)
Sanity check: the data-driven selector recovers the published informative set.
EXECUTION STATE
Output = matches manual? True
28 lines without explanation
1import numpy as np
2import pandas as pd
34COLUMNS =(5["engine_id","cycle"]6+[f"op_set_{i}"for i inrange(1,4)]7+[f"sensor_{i}"for i inrange(1,22)]8)9SENSOR_COLS =[f"sensor_{i}"for i inrange(1,22)]101112defselect_informative(df: pd.DataFrame,13 var_threshold:float=1e-6,14 corr_threshold:float=0.05)->list[str]:15"""Drop constant sensors AND sensors weakly correlated with RUL.
1617 Returns the list of column names that survive both filters.
18 """19# Filter 1: variance20 keep = df[SENSOR_COLS].std()> var_threshold
21 survivors = keep[keep].index.tolist()2223# Filter 2: |Pearson correlation with RUL| above threshold24if"RUL"in df.columns and survivors:25 corrs = df[survivors].corrwith(df["RUL"]).abs()26 survivors = corrs[corrs > corr_threshold].index.tolist()2728return survivors
293031# ----- Run on FD001 train -----32df = pd.read_csv("data/raw/train_FD001.txt", sep=r"\s+", header=None, names=COLUMNS)33df["RUL"]= df.groupby("engine_id")["cycle"].transform("max")- df["cycle"]3435selected = select_informative(df)36print(f"selected ({len(selected)}):", selected)3738# Compare to manual labelling39expected =["sensor_2","sensor_3","sensor_4","sensor_7","sensor_8",40"sensor_9","sensor_11","sensor_12","sensor_13","sensor_14",41"sensor_15","sensor_17","sensor_20","sensor_21"]42print("matches manual?",set(selected)==set(expected))4344# selected (14): ['sensor_2', 'sensor_3', 'sensor_4', 'sensor_7', ...]45# matches manual? True
The data-driven and the manual answer agree. The 14 sensors NASA hand-labelled as informative match what an automatic variance + correlation filter recovers. That is rare and valuable - it means we can ship the filter to a new dataset (say, N-CMAPSS) and trust it.
PyTorch: Applied at the Dataset Boundary
The cleanest place to apply feature selection is in the Dataset.__init__ so the model never sees the dropped columns. The class below is identical to Section 2.1's CMAPSSDataset with a 14-sensor subset.
Drop constants at Dataset construction
🐍filtered_cmapss_dataset.py
Explanation(24)
Code(39)
1from torch.utils.data import Dataset
Base class.
2import numpy as np, pandas as pd, torch
Compact one-line import.
4INFORMATIVE_IDX = [...]
Same 14-sensor index list from §5.2's PyTorch block. 0-based indices into the 21-sensor catalog.
7class FilteredCMAPSSDataset(Dataset):
Identical to Section 2.1's CMAPSSDataset except it consumes only the 14 informative sensor columns. Output X has 14 channels instead of 21.
10def __init__(self, csv_path, window=30):
Same constructor signature as §2.1; the change is a different sensor subset internally.
14df = pd.read_csv(...)
Same loader.
15df["RUL"] = ...
Per-engine RUL.
17sensor_cols = [f"sensor_{i+1}" for i in INFORMATIVE_IDX]
Translate the 0-based INFORMATIVE_IDX into the 1-based sensor_<n> column names. The +1 bridges the two conventions.
X.shape = torch.Size([30, 14]) - reduced from (30, 21)
33y = torch.tensor(ruls[end - 1])
RUL at end of window.
34return X, y
(X, y) tuple.
37ds = FilteredCMAPSSDataset(...)
Construct on FD001.
38X, y = ds[0]
Pull sample 0.
39print("X.shape:", tuple(X.shape))
Verify reduced shape.
EXECUTION STATE
Output = X.shape: (30, 14)
15 lines without explanation
1from torch.utils.data import Dataset
2import numpy as np, pandas as pd, torch
34INFORMATIVE_IDX =[1,2,3,6,7,8,10,11,12,13,14,16,19,20]# 0-based; 14 sensors567classFilteredCMAPSSDataset(Dataset):8"""Same windowing as Section 2.1's CMAPSSDataset, but selects 14 sensors."""910def__init__(self, csv_path:str, window:int=30):11 cols =(["engine_id","cycle"]12+[f"op_set_{i}"for i inrange(1,4)]13+[f"sensor_{i}"for i inrange(1,22)])14 df = pd.read_csv(csv_path, sep=r"\s+", header=None, names=cols)15 df["RUL"]= df.groupby("engine_id")["cycle"].transform("max")- df["cycle"]1617 sensor_cols =[f"sensor_{i+1}"for i in INFORMATIVE_IDX]18 self.window = window
19 self.samples, self.engines =[],{}20for eid, sub in df.groupby("engine_id"):21 arr = sub[sensor_cols].to_numpy(dtype=np.float32)# (N_e, 14)22 ruls = sub["RUL"].to_numpy(dtype=np.float32)23 self.engines[eid]=(arr, ruls)24for end inrange(window,len(sub)+1):25 self.samples.append((eid, end))2627def__len__(self):returnlen(self.samples)2829def__getitem__(self, idx):30 eid, end = self.samples[idx]31 arr, ruls = self.engines[eid]32 X = torch.from_numpy(arr[end - self.window:end])# (W, 14)33 y = torch.tensor(ruls[end -1])34return X, y
353637ds = FilteredCMAPSSDataset("data/raw/train_FD001.txt", window=30)38X, y = ds[0]39print("X.shape:",tuple(X.shape))# (30, 14)
Why this matters for the backbone. CNN (Chapter 8) declares in_channels=14 instead of 17 once you adopt the filtered dataset. Saves a tiny number of parameters and a small amount of compute - but more importantly, every sensor the model sees is one that actually moves.
Feature Selection in Other Pipelines
Domain
Selection criterion
Typical method
RUL (this book)
Variance + RUL correlation
Two-stage filter (this section)
Tabular classification
Mutual information w/ target
sklearn SelectKBest
Genomics
Differential expression
DESeq2, edgeR
NLP
TF-IDF threshold
scikit-learn TfidfVectorizer
Vision
Layer-wise principal components
PCA, ICA
Time-series anomaly
Variance + autocorrelation
STL decomposition
Three Selection Pitfalls
Pitfall 1: Selecting on test data. Compute std/correlation on the TRAIN file only, then apply the resulting column list to test. Leaking test statistics into selection biases evaluation.
Pitfall 2: Per-FD selection mismatch. Different FD subsets have different constant sets. Pick the UNION (drop only sensors that are constant on EVERY FD subset) if you want one model that runs on all four.
Pitfall 3: Aggressive correlation thresholds. A threshold of 0.5 might drop sensors that contribute non-linearly (the model can still learn from a sensor that has 0.1 linear correlation). 0.05 is conservative and matches the published 14-sensor convention.
The takeaway in one sentence. Drop the seven constants, keep the fourteen movers, ship.
Takeaway
14 of 21 sensors carry signal on FD001. Variance + RUL-correlation filters recover the canonical set automatically.
Selection happens once, at the Dataset boundary. The model just declares in_channels=14 and never knows the dropped sensors existed.
Always select on train data only. Test statistics must not influence which features the model sees.
Correlation threshold is conservative. 0.05 is the convention; tighter thresholds risk dropping non-linear contributors the network would have used.