AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand the complexity hierarchy from FD001 (simplest) to FD004 (most challenging)
Interpret operating conditions (altitude, Mach number, throttle) and their physical meaning
Distinguish fault modes: HPC degradation vs Fan degradation
Explain why multiple conditions complicate prediction through sensor value distributions
Recognize the need for condition-aware normalization to handle multi-regime data

Why This Matters: Many RUL prediction methods achieve good results on FD001 but fail on FD004. Understanding why requires grasping the fundamental differences between these datasets. Our AMNL model achieves state-of-the-art on all four precisely because it addresses these differences through careful design choices.

Dataset Complexity Spectrum

The four C-MAPSS sub-datasets form a complexity spectrum, varying along two axes:

Two Dimensions of Complexity

Dataset	Operating Conditions	Fault Modes	Complexity
FD001	1 (Sea Level)	1 (HPC)	Lowest
FD002	6 (Various)	1 (HPC)	Medium
FD003	1 (Sea Level)	2 (HPC + Fan)	Medium
FD004	6 (Various)	2 (HPC + Fan)	Highest

FD001 is the controlled experiment: single condition, single fault. Each additional complexity dimension (multiple conditions OR multiple faults) creates a new challenge. FD004 combines both.

Published Performance Gap

The performance gap between datasets is substantial. For a typical deep learning model:

Dataset	Typical RMSE	Relative Difficulty
FD001	12-15	1× (baseline)
FD002	18-25	~1.5× harder
FD003	13-17	~1.2× harder
FD004	22-30	~2× harder

Our AMNL model narrows this gap significantly by addressing the root causes of these performance differences.

Operating Conditions Explained

Each data row includes three operating condition settings that define the engine's current operating regime.

The Three Operating Settings

Setting	Physical Meaning	Range in Data
Altitude	Flight altitude above sea level	0 - 42,000 ft
Mach Number	Flight speed relative to sound	0 - 0.84
Throttle Resolver Angle (TRA)	Pilot throttle position	0 - 100

Why Operating Conditions Matter

Operating conditions fundamentally change sensor readings even for a healthy engine:

The Six Operating Regimes in FD002/FD004

FD002 and FD004 include six distinct operating condition combinations:

Regime	Altitude (ft)	Mach	TRA	Flight Phase
1	0	0	100	Ground idle / takeoff
2	10,000	0.25	100	Low altitude climb
3	20,000	0.70	100	Mid altitude cruise
4	25,000	0.62	60	Reduced thrust cruise
5	35,000	0.84	100	High altitude cruise
6	42,000	0.84	100	Max altitude cruise

Each regime produces different "normal" sensor readings. A model must learn that the same sensor value means different things in different regimes.

The Multi-Condition Challenge

In FD002/FD004, sensor distributions are multimodal. A temperature of 600°R might be normal at sea level but indicate severe degradation at high altitude. Global normalization destroys this information—we need per-condition normalization (covered in Chapter 4).

Fault Modes and Degradation Patterns

The C-MAPSS simulation introduces degradation through specific failure modes affecting different engine components.

Fault Mode 1: HPC Degradation

High Pressure Compressor (HPC) degradation is present in all four datasets:

Physical cause: Blade tip erosion, fouling, increased clearances
Effect: Reduced compression efficiency and flow capacity
Sensor signatures: Increased HPC outlet temperature (T30), decreased efficiency ratios

HPC degradation affects the thermodynamic cycle:

\eta_{\text{HPC}} \downarrow \Rightarrow T_{30} \uparrow, P_{30} \downarrow, \text{SFC} \uparrow

Where SFC is Specific Fuel Consumption (more fuel needed for same thrust).

Fault Mode 2: Fan Degradation

Fan degradation is present only in FD003 and FD004:

Physical cause: Foreign object damage, blade erosion, tip rubs
Effect: Reduced fan efficiency and bypass ratio
Sensor signatures: Changed bypass ratio (BPR), altered fan speed characteristics

Why Two Fault Modes is Harder

With a single fault mode, degradation patterns are consistent:

\text{Degradation} = f(\text{HPC wear})

With two fault modes, degradation can follow different paths:

\text{Degradation} = f(\text{HPC wear}) \text{ OR } g(\text{Fan wear}) \text{ OR } h(\text{both})

FD001: The Baseline Case

FD001 is the simplest and most studied sub-dataset. Understanding it deeply provides the foundation for tackling more complex variants.

FD001 Characteristics

Property	Value
Training engines	100
Test engines	100
Operating conditions	1 (sea level)
Fault mode	1 (HPC degradation only)
Total training cycles	20,631
Min trajectory length	128 cycles
Max trajectory length	362 cycles
Mean trajectory length	206 cycles

Why FD001 is "Easy"

Unimodal sensor distributions: All engines operate at sea level, so sensor values cluster around single means
Consistent degradation pattern: Only HPC fault, so degradation signatures are uniform
Simple normalization: Global mean/std normalization works well
Clean trends: Sensors show monotonic degradation without regime jumps

FD001 Baseline Performance

State-of-the-art methods on FD001 achieve RMSE around 11-12 cycles. Our AMNL model achieves RMSE ≈ 11.44, competitive with the best published results.

FD002, FD003, FD004: Increasing Complexity

FD002: Multiple Operating Conditions

Property	Value
Training engines	260
Test engines	259
Operating conditions	6
Fault mode	1 (HPC only)
Total training cycles	53,759

Challenge: Same fault mode, but sensors now have 6 different "normal" baselines depending on operating regime.

Solution approach: Per-condition normalization to remove regime effects before feeding data to the model.

FD003: Multiple Fault Modes

Property	Value
Training engines	100
Test engines	100
Operating conditions	1 (sea level)
Fault modes	2 (HPC + Fan)
Total training cycles	24,720

Challenge: Single operating condition, but degradation can follow different patterns depending on which component fails first.

Solution approach: Model must learn multiple degradation signatures and potentially identify fault type implicitly.

FD004: Maximum Complexity

Property	Value
Training engines	249
Test engines	248
Operating conditions	6
Fault modes	2 (HPC + Fan)
Total training cycles	61,249

Challenge: Everything is variable—operating conditions AND fault modes. This is closest to real-world conditions where engines operate across regimes and can fail in multiple ways.

Solution approach: Combine per-condition normalization with a powerful model that can learn multiple degradation patterns simultaneously.

FD004: The Real Test

FD004 is where many methods fail. A model that achieves 12 RMSE on FD001 might achieve 28+ RMSE on FD004. Our AMNL model achieves RMSE ≈ 19.34 on FD004—a 21% improvement over previous state-of-the-art.

Implications for Model Design

Understanding the dataset differences leads directly to design decisions in our AMNL model:

1. Per-Condition Normalization

For FD002 and FD004, we normalize each sensor within each operating condition separately:

x_{\text{normalized}}^{(c)} = \frac{x - \mu^{(c)}}{\sigma^{(c)}}

Where $\mu^{(c)}$ and $\sigma^{(c)}$ are computed only from data in operating condition $c$ .

2. Multi-Task Learning

With multiple fault modes, predicting both RUL (continuous) and health state (categorical) helps the model learn more robust features:

Health state classification learns to distinguish degradation stages
RUL regression learns fine-grained cycle predictions
Shared features benefit from both signal types

3. Attention for Variable Patterns

With different fault modes, important timesteps vary between engines. Attention allows the model to focus on relevant degradation signatures regardless of fault type.

4. Dataset-Specific Evaluation

We must evaluate on all four datasets separately. A method that only works on FD001 is not useful for real-world deployment where conditions vary.

Challenge	Source	Our Solution
Multimodal distributions	Multiple conditions	Per-condition normalization
Variable degradation	Multiple fault modes	Attention + multi-task learning
Different data sizes	Dataset variability	Same architecture, dataset-specific training
Performance gap	FD001 vs FD004	Careful preprocessing + powerful model

Summary

In this section, we explored the four C-MAPSS sub-datasets and their key differences:

Complexity spectrum: FD001 (simplest) → FD004 (most complex)
Operating conditions: 1 regime (FD001/FD003) vs 6 regimes (FD002/FD004) affecting all sensor readings
Fault modes: HPC only (FD001/FD002) vs HPC + Fan (FD003/FD004) creating variable degradation patterns
Multi-condition challenge: Same sensor value means different things in different regimes
Multi-fault challenge: Same symptom can indicate different fault types
Design implications: Per-condition normalization, attention, multi-task learning

Dataset	Conditions	Faults	Key Challenge
FD001	1	1	Baseline—learn HPC degradation
FD002	6	1	Handle regime shifts in sensors
FD003	1	2	Handle variable degradation patterns
FD004	6	2	Handle both simultaneously

Looking Ahead: Not all 21 sensors are equally informative. Some remain nearly constant; others are dominated by noise. In the next section, we will analyze each sensor and justify our selection of 17 informative features—the input to our model.

With the dataset complexity understood, we are ready to select the most predictive features.