Chapter 3
15 min read
Section 14 of 104

Feature Selection: 17 Informative Features

NASA C-MAPSS Dataset Deep Dive

Learning Objectives

By the end of this section, you will:

  1. Understand the actual feature selection used in our research implementation
  2. Identify the 14 selected sensors and why each was chosen
  3. Know which sensors are excluded and the reasoning behind removal
  4. Understand the 17-feature convention (3 settings + 14 sensors) used in literature
  5. Apply the enforce_feature_set parameter for consistent benchmarking
Why This Matters: The C-MAPSS dataset has 21 sensors, but not all carry useful information. Our research implementation uses exactly 17 features—3 operating settings plus 14 carefully selected sensors. This section shows the actual code that selects these features and explains the selection criteria.

Why Feature Selection?

With deep learning, it's tempting to feed all available data into the model. However, the literature has converged on a specific 17-feature subset for C-MAPSS benchmarking:

Benefits of Curated Feature Selection

BenefitMechanismImpact
Reduced noiseRemove uninformative sensorsCleaner gradient signal
Fair comparisonStandard feature setComparable to published work
Domain groundingPhysics-based selectionMore robust features
Faster trainingSmaller input dimension17 vs 21 features
Better generalizationLess noise to memorizeLower test error

The Selection Criteria

For each of the 21 sensors, we ask:

  1. Does it vary? Constant sensors (std < 0.01) provide no degradation information
  2. Does it correlate with RUL? Variation should reflect degradation, not just noise
  3. Is it physically meaningful? Domain knowledge validates the selection

Research Implementation

This is the actual feature selection code from our EnhancedNASACMAPSSDataset class. It defines exactly which sensors are used as model inputs.

Feature Selection Implementation
🐍models/enhanced_sota_rul_predictor.py
1Feature Column Definition

This section of the EnhancedNASACMAPSSDataset defines which features are used as model inputs. The selection is based on literature review and domain knowledge.

2Operating Settings

The three operating settings (altitude, Mach number, throttle) define the engine's current regime. They're critical for per-condition normalization in FD002/FD004.

EXAMPLE
setting_1: Altitude (0-42,000 ft)
5Sensor Selection Rationale

These 14 sensors were selected based on extensive literature analysis. They show meaningful variance over engine lifetime and correlate with degradation patterns.

7Temperature Sensors (sensor_7, 8)

HPC and LPT outlet temperatures increase as component efficiency degrades. These are primary indicators of compressor and turbine health.

EXAMPLE
T30 (sensor_7): Increases ~10°R over engine life
9Pressure Sensors (sensor_9, 11, 6)

HPC outlet pressure and static pressure decrease as flow capacity degrades. Pressure ratios reflect thermodynamic efficiency.

12Bypass Ratio (sensor_16)

BPR decreases with fan degradation. It's a key indicator of bypass vs core flow balance, especially relevant for FD003/FD004.

13Bleed Enthalpy (sensor_17)

Bleed enthalpy reflects the energy extracted for cooling and cabin pressurization. It decreases as the engine's overall efficiency degrades.

14Coolant Bleed (sensor_20)

HPT coolant bleed flow changes as turbine inlet temperature and cooling requirements shift with degradation.

18Corrected Speeds (sensor_13, 14)

Corrected fan and core speeds account for ambient conditions. They increase as the engine works harder to maintain thrust with degraded efficiency.

22Feature Set Enforcement

When enforce_feature_set=True (default), we use exactly 17 features for consistency with prior work. This enables fair comparison with published benchmarks.

26Full Feature Option

Setting enforce_feature_set=False uses all 24 original features (3 settings + 21 sensors). This is useful for ablation studies comparing feature selection impact.

20 lines without explanation
1# Define feature columns based on literature-recommended sensors
2self.setting_cols = ['setting_1', 'setting_2', 'setting_3']
3
4# 14 informative sensors selected based on literature analysis
5# These sensors show meaningful variance and degradation correlation
6sensor_keep = [
7    'sensor_7',   # Total temperature at HPC outlet
8    'sensor_8',   # Total temperature at LPT outlet
9    'sensor_9',   # Pressure at HPC outlet
10    'sensor_12',  # Pressure ratio (EPR)
11    'sensor_16',  # Bypass ratio (BPR)
12    'sensor_17',  # Bleed enthalpy
13    'sensor_20',  # HPT coolant bleed
14    'sensor_2',   # Total temperature at LPC outlet
15    'sensor_3',   # Total temperature at HPC outlet
16    'sensor_4',   # Total temperature at LPT outlet
17    'sensor_14',  # Corrected core speed
18    'sensor_11',  # Static pressure at HPC outlet
19    'sensor_13',  # Corrected fan speed
20    'sensor_6',   # Total pressure at HPC outlet
21]
22
23if self.enforce_feature_set:
24    # 17 features total: 3 settings + 14 sensors
25    self.feature_columns = self.setting_cols + sensor_keep
26else:
27    # Use all 24 features (3 settings + 21 sensors)
28    self.feature_columns = self.setting_cols + [f'sensor_{i}' for i in range(1, 22)]
29
30logger.info(f"Dataset features: {len(self.feature_columns)} features configured")
31logger.info(f"Using {'enforced 17-feature set' if self.enforce_feature_set else 'all features'}")

Feature Column Mapping

FeatureSensor IDPhysical MeaningDegradation Trend
setting_1-Altitude (ft)Operating condition
setting_2-Mach numberOperating condition
setting_3-Throttle resolver angleOperating condition
sensor_2T24LPC outlet temperatureIncreases
sensor_3T30HPC outlet temperatureIncreases
sensor_4T50LPT outlet temperatureIncreases
sensor_6P30Total pressure at HPC outletDecreases
sensor_7T30Total temp at HPC outletIncreases
sensor_8T50Total temp at LPT outletIncreases
sensor_9P30Pressure at HPC outletDecreases
sensor_11Ps30Static pressure at HPC outletDecreases
sensor_12EPREngine pressure ratioDecreases
sensor_13NRfCorrected fan speedIncreases
sensor_14NRcCorrected core speedIncreases
sensor_16BPRBypass ratioDecreases
sensor_17htBleedBleed enthalpyDecreases
sensor_20W31HPT coolant bleedDecreases

Constant Sensors (Removed)

Seven sensors are excluded because they show near-zero variance across all engines and cycles:

Excluded Sensors

SensorNameReason for Exclusion
sensor_1T2 (Fan inlet temp)Equals ambient temp at sea level
sensor_5P2 (Fan inlet pressure)Equals ambient pressure
sensor_10eprTightly controlled by engine computer
sensor_15P15Bypass duct pressure, follows ambient
sensor_18Nf_dmdDemanded fan speed (control input)
sensor_19PCNfR_dmdDemanded corrected speed (control)
sensor_21W32LPT coolant bleed (low variance)

FD002/FD004 Difference

In FD002 and FD004 (6 operating conditions), T2 and P2 vary with altitude. However, this variation reflects operating regime, not degradation. Our per-condition normalization removes this effect, making them effectively constant within each condition.


Informative Sensors (Kept)

The 14 selected sensors show meaningful variation that correlates with engine degradation:

Temperature Sensors

SensorLocationDegradation Effect
sensor_2 (T24)LPC outletMild increase (upstream effect)
sensor_3 (T30)HPC outletStrong increase (HPC degradation)
sensor_4 (T50)LPT outletModerate increase (propagation)
sensor_7HPC outlet totalStrong increase
sensor_8LPT outlet totalModerate increase

The physics is straightforward:

ηHPCMore work for same compressionT30\eta_{\text{HPC}} \downarrow \Rightarrow \text{More work for same compression} \Rightarrow T_{30} \uparrow

Pressure and Speed Sensors

SensorMeaningDegradation Effect
sensor_6, 9 (P30)HPC outlet pressureDecreases (flow capacity loss)
sensor_11 (Ps30)Static pressureDecreases (efficiency loss)
sensor_12 (EPR)Engine pressure ratioDecreases
sensor_13 (NRf)Corrected fan speedIncreases (compensation)
sensor_14 (NRc)Corrected core speedIncreases (compensation)
sensor_16 (BPR)Bypass ratioDecreases (fan degradation)

Correlation with RUL


Final 17 Features

After selection, our input data consists of 17 features in a specific order:

Feature Order (as implemented)

IndexFeatureCategory
0-2setting_1, setting_2, setting_3Operating Settings
3sensor_7HPC outlet temp
4sensor_8LPT outlet temp
5sensor_9HPC outlet pressure
6sensor_12Engine pressure ratio
7sensor_16Bypass ratio
8sensor_17Bleed enthalpy
9sensor_20HPT coolant bleed
10sensor_2LPC outlet temp
11sensor_3HPC outlet temp
12sensor_4LPT outlet temp
13sensor_14Corrected core speed
14sensor_11Static pressure
15sensor_13Corrected fan speed
16sensor_6Total pressure

Input Tensor Shape

After feature selection and windowing:

XRN×30×17\mathbf{X} \in \mathbb{R}^{N \times 30 \times 17}
  • NN: Number of sliding window samples
  • 3030: Window size (timesteps)
  • 1717: Selected features

Summary

In this section, we examined the actual feature selection from our research code:

  1. 17 features total: 3 operating settings + 14 selected sensors
  2. Selection criteria: Variance, RUL correlation, physical meaning
  3. Excluded sensors: 7 sensors with near-zero variance (T2, P2, epr, etc.)
  4. Key informative sensors: Temperature (T30, T50), pressure (P30), speeds (NRc, NRf)
  5. enforce_feature_set: Parameter to ensure consistent benchmarking
CategoryFeaturesCount
Operating Settingssetting_1, setting_2, setting_33
Temperaturesensor_2, 3, 4, 7, 85
Pressuresensor_6, 9, 11, 124
Speed/Flowsensor_13, 14, 16, 17, 205
Looking Ahead: With features selected, we need to define our prediction targets. The raw RUL is problematic—early-life engines all have high RUL with imperceptible degradation. Next, we introduce the piecewise linear RUL model that clips RUL at 125 cycles.

With informative features identified, we are ready to formulate the target variable for our regression task.