AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand the actual feature selection used in our research implementation
Identify the 14 selected sensors and why each was chosen
Know which sensors are excluded and the reasoning behind removal
Understand the 17-feature convention (3 settings + 14 sensors) used in literature
Apply the enforce_feature_set parameter for consistent benchmarking

Why This Matters: The C-MAPSS dataset has 21 sensors, but not all carry useful information. Our research implementation uses exactly 17 features—3 operating settings plus 14 carefully selected sensors. This section shows the actual code that selects these features and explains the selection criteria.

Why Feature Selection?

With deep learning, it's tempting to feed all available data into the model. However, the literature has converged on a specific 17-feature subset for C-MAPSS benchmarking:

Benefits of Curated Feature Selection

Benefit	Mechanism	Impact
Reduced noise	Remove uninformative sensors	Cleaner gradient signal
Fair comparison	Standard feature set	Comparable to published work
Domain grounding	Physics-based selection	More robust features
Faster training	Smaller input dimension	17 vs 21 features
Better generalization	Less noise to memorize	Lower test error

The Selection Criteria

For each of the 21 sensors, we ask:

Does it vary? Constant sensors (std < 0.01) provide no degradation information
Does it correlate with RUL? Variation should reflect degradation, not just noise
Is it physically meaningful? Domain knowledge validates the selection

Research Implementation

This is the actual feature selection code from our EnhancedNASACMAPSSDataset class. It defines exactly which sensors are used as model inputs.

Feature Selection Implementation

🐍models/enhanced_sota_rul_predictor.py

Explanation(11)

Code(31)

1Feature Column Definition

This section of the EnhancedNASACMAPSSDataset defines which features are used as model inputs. The selection is based on literature review and domain knowledge.

2Operating Settings

The three operating settings (altitude, Mach number, throttle) define the engine's current regime. They're critical for per-condition normalization in FD002/FD004.

EXAMPLE

setting_1: Altitude (0-42,000 ft)

5Sensor Selection Rationale

These 14 sensors were selected based on extensive literature analysis. They show meaningful variance over engine lifetime and correlate with degradation patterns.

7Temperature Sensors (sensor_7, 8)

HPC and LPT outlet temperatures increase as component efficiency degrades. These are primary indicators of compressor and turbine health.

EXAMPLE

T30 (sensor_7): Increases ~10°R over engine life

9Pressure Sensors (sensor_9, 11, 6)

HPC outlet pressure and static pressure decrease as flow capacity degrades. Pressure ratios reflect thermodynamic efficiency.

12Bypass Ratio (sensor_16)

BPR decreases with fan degradation. It's a key indicator of bypass vs core flow balance, especially relevant for FD003/FD004.

13Bleed Enthalpy (sensor_17)

Bleed enthalpy reflects the energy extracted for cooling and cabin pressurization. It decreases as the engine's overall efficiency degrades.

14Coolant Bleed (sensor_20)

HPT coolant bleed flow changes as turbine inlet temperature and cooling requirements shift with degradation.

18Corrected Speeds (sensor_13, 14)

Corrected fan and core speeds account for ambient conditions. They increase as the engine works harder to maintain thrust with degraded efficiency.

22Feature Set Enforcement

When enforce_feature_set=True (default), we use exactly 17 features for consistency with prior work. This enables fair comparison with published benchmarks.

26Full Feature Option

Setting enforce_feature_set=False uses all 24 original features (3 settings + 21 sensors). This is useful for ablation studies comparing feature selection impact.

20 lines without explanation

1# Define feature columns based on literature-recommended sensors
2self.setting_cols = ['setting_1', 'setting_2', 'setting_3']
3
4# 14 informative sensors selected based on literature analysis
5# These sensors show meaningful variance and degradation correlation
6sensor_keep = [
7    'sensor_7',   # Total temperature at HPC outlet
8    'sensor_8',   # Total temperature at LPT outlet
9    'sensor_9',   # Pressure at HPC outlet
10    'sensor_12',  # Pressure ratio (EPR)
11    'sensor_16',  # Bypass ratio (BPR)
12    'sensor_17',  # Bleed enthalpy
13    'sensor_20',  # HPT coolant bleed
14    'sensor_2',   # Total temperature at LPC outlet
15    'sensor_3',   # Total temperature at HPC outlet
16    'sensor_4',   # Total temperature at LPT outlet
17    'sensor_14',  # Corrected core speed
18    'sensor_11',  # Static pressure at HPC outlet
19    'sensor_13',  # Corrected fan speed
20    'sensor_6',   # Total pressure at HPC outlet
21]
22
23if self.enforce_feature_set:
24    # 17 features total: 3 settings + 14 sensors
25    self.feature_columns = self.setting_cols + sensor_keep
26else:
27    # Use all 24 features (3 settings + 21 sensors)
28    self.feature_columns = self.setting_cols + [f'sensor_{i}' for i in range(1, 22)]
29
30logger.info(f"Dataset features: {len(self.feature_columns)} features configured")
31logger.info(f"Using {'enforced 17-feature set' if self.enforce_feature_set else 'all features'}")

Feature Column Mapping

Feature	Sensor ID	Physical Meaning	Degradation Trend
setting_1	-	Altitude (ft)	Operating condition
setting_2	-	Mach number	Operating condition
setting_3	-	Throttle resolver angle	Operating condition
sensor_2	T24	LPC outlet temperature	Increases
sensor_3	T30	HPC outlet temperature	Increases
sensor_4	T50	LPT outlet temperature	Increases
sensor_6	P30	Total pressure at HPC outlet	Decreases
sensor_7	T30	Total temp at HPC outlet	Increases
sensor_8	T50	Total temp at LPT outlet	Increases
sensor_9	P30	Pressure at HPC outlet	Decreases
sensor_11	Ps30	Static pressure at HPC outlet	Decreases
sensor_12	EPR	Engine pressure ratio	Decreases
sensor_13	NRf	Corrected fan speed	Increases
sensor_14	NRc	Corrected core speed	Increases
sensor_16	BPR	Bypass ratio	Decreases
sensor_17	htBleed	Bleed enthalpy	Decreases
sensor_20	W31	HPT coolant bleed	Decreases

Constant Sensors (Removed)

Seven sensors are excluded because they show near-zero variance across all engines and cycles:

Excluded Sensors

Sensor	Name	Reason for Exclusion
sensor_1	T2 (Fan inlet temp)	Equals ambient temp at sea level
sensor_5	P2 (Fan inlet pressure)	Equals ambient pressure
sensor_10	epr	Tightly controlled by engine computer
sensor_15	P15	Bypass duct pressure, follows ambient
sensor_18	Nf_dmd	Demanded fan speed (control input)
sensor_19	PCNfR_dmd	Demanded corrected speed (control)
sensor_21	W32	LPT coolant bleed (low variance)

FD002/FD004 Difference

In FD002 and FD004 (6 operating conditions), T2 and P2 vary with altitude. However, this variation reflects operating regime, not degradation. Our per-condition normalization removes this effect, making them effectively constant within each condition.

Informative Sensors (Kept)

The 14 selected sensors show meaningful variation that correlates with engine degradation:

Temperature Sensors

Sensor	Location	Degradation Effect
sensor_2 (T24)	LPC outlet	Mild increase (upstream effect)
sensor_3 (T30)	HPC outlet	Strong increase (HPC degradation)
sensor_4 (T50)	LPT outlet	Moderate increase (propagation)
sensor_7	HPC outlet total	Strong increase
sensor_8	LPT outlet total	Moderate increase

The physics is straightforward:

\eta_{\text{HPC}} \downarrow \Rightarrow \text{More work for same compression} \Rightarrow T_{30} \uparrow

Pressure and Speed Sensors

Sensor	Meaning	Degradation Effect
sensor_6, 9 (P30)	HPC outlet pressure	Decreases (flow capacity loss)
sensor_11 (Ps30)	Static pressure	Decreases (efficiency loss)
sensor_12 (EPR)	Engine pressure ratio	Decreases
sensor_13 (NRf)	Corrected fan speed	Increases (compensation)
sensor_14 (NRc)	Corrected core speed	Increases (compensation)
sensor_16 (BPR)	Bypass ratio	Decreases (fan degradation)

Correlation with RUL

Final 17 Features

After selection, our input data consists of 17 features in a specific order:

Feature Order (as implemented)

Index	Feature	Category
0-2	setting_1, setting_2, setting_3	Operating Settings
3	sensor_7	HPC outlet temp
4	sensor_8	LPT outlet temp
5	sensor_9	HPC outlet pressure
6	sensor_12	Engine pressure ratio
7	sensor_16	Bypass ratio
8	sensor_17	Bleed enthalpy
9	sensor_20	HPT coolant bleed
10	sensor_2	LPC outlet temp
11	sensor_3	HPC outlet temp
12	sensor_4	LPT outlet temp
13	sensor_14	Corrected core speed
14	sensor_11	Static pressure
15	sensor_13	Corrected fan speed
16	sensor_6	Total pressure

Input Tensor Shape

After feature selection and windowing:

\mathbf{X} \in \mathbb{R}^{N \times 30 \times 17}

$N$ : Number of sliding window samples
$30$ : Window size (timesteps)
$17$ : Selected features

Summary

In this section, we examined the actual feature selection from our research code:

17 features total: 3 operating settings + 14 selected sensors
Selection criteria: Variance, RUL correlation, physical meaning
Excluded sensors: 7 sensors with near-zero variance (T2, P2, epr, etc.)
Key informative sensors: Temperature (T30, T50), pressure (P30), speeds (NRc, NRf)
enforce_feature_set: Parameter to ensure consistent benchmarking

Category	Features	Count
Operating Settings	setting_1, setting_2, setting_3	3
Temperature	sensor_2, 3, 4, 7, 8	5
Pressure	sensor_6, 9, 11, 12	4
Speed/Flow	sensor_13, 14, 16, 17, 20	5

Looking Ahead: With features selected, we need to define our prediction targets. The raw RUL is problematic—early-life engines all have high RUL with imperceptible degradation. Next, we introduce the piecewise linear RUL model that clips RUL at 125 cycles.

With informative features identified, we are ready to formulate the target variable for our regression task.