Learning Objectives
By the end of this section, you will:
- Understand the complexity hierarchy from FD001 (simplest) to FD004 (most challenging)
- Interpret operating conditions (altitude, Mach number, throttle) and their physical meaning
- Distinguish fault modes: HPC degradation vs Fan degradation
- Explain why multiple conditions complicate prediction through sensor value distributions
- Recognize the need for condition-aware normalization to handle multi-regime data
Why This Matters: Many RUL prediction methods achieve good results on FD001 but fail on FD004. Understanding why requires grasping the fundamental differences between these datasets. Our AMNL model achieves state-of-the-art on all four precisely because it addresses these differences through careful design choices.
Dataset Complexity Spectrum
The four C-MAPSS sub-datasets form a complexity spectrum, varying along two axes:
Two Dimensions of Complexity
| Dataset | Operating Conditions | Fault Modes | Complexity |
|---|---|---|---|
| FD001 | 1 (Sea Level) | 1 (HPC) | Lowest |
| FD002 | 6 (Various) | 1 (HPC) | Medium |
| FD003 | 1 (Sea Level) | 2 (HPC + Fan) | Medium |
| FD004 | 6 (Various) | 2 (HPC + Fan) | Highest |
FD001 is the controlled experiment: single condition, single fault. Each additional complexity dimension (multiple conditions OR multiple faults) creates a new challenge. FD004 combines both.
Published Performance Gap
The performance gap between datasets is substantial. For a typical deep learning model:
| Dataset | Typical RMSE | Relative Difficulty |
|---|---|---|
| FD001 | 12-15 | 1× (baseline) |
| FD002 | 18-25 | ~1.5× harder |
| FD003 | 13-17 | ~1.2× harder |
| FD004 | 22-30 | ~2× harder |
Our AMNL model narrows this gap significantly by addressing the root causes of these performance differences.
Operating Conditions Explained
Each data row includes three operating condition settings that define the engine's current operating regime.
The Three Operating Settings
| Setting | Physical Meaning | Range in Data |
|---|---|---|
| Altitude | Flight altitude above sea level | 0 - 42,000 ft |
| Mach Number | Flight speed relative to sound | 0 - 0.84 |
| Throttle Resolver Angle (TRA) | Pilot throttle position | 0 - 100 |
Why Operating Conditions Matter
Operating conditions fundamentally change sensor readings even for a healthy engine:
The Six Operating Regimes in FD002/FD004
FD002 and FD004 include six distinct operating condition combinations:
| Regime | Altitude (ft) | Mach | TRA | Flight Phase |
|---|---|---|---|---|
| 1 | 0 | 0 | 100 | Ground idle / takeoff |
| 2 | 10,000 | 0.25 | 100 | Low altitude climb |
| 3 | 20,000 | 0.70 | 100 | Mid altitude cruise |
| 4 | 25,000 | 0.62 | 60 | Reduced thrust cruise |
| 5 | 35,000 | 0.84 | 100 | High altitude cruise |
| 6 | 42,000 | 0.84 | 100 | Max altitude cruise |
Each regime produces different "normal" sensor readings. A model must learn that the same sensor value means different things in different regimes.
The Multi-Condition Challenge
In FD002/FD004, sensor distributions are multimodal. A temperature of 600°R might be normal at sea level but indicate severe degradation at high altitude. Global normalization destroys this information—we need per-condition normalization (covered in Chapter 4).
Fault Modes and Degradation Patterns
The C-MAPSS simulation introduces degradation through specific failure modes affecting different engine components.
Fault Mode 1: HPC Degradation
High Pressure Compressor (HPC) degradation is present in all four datasets:
- Physical cause: Blade tip erosion, fouling, increased clearances
- Effect: Reduced compression efficiency and flow capacity
- Sensor signatures: Increased HPC outlet temperature (T30), decreased efficiency ratios
HPC degradation affects the thermodynamic cycle:
Where SFC is Specific Fuel Consumption (more fuel needed for same thrust).
Fault Mode 2: Fan Degradation
Fan degradation is present only in FD003 and FD004:
- Physical cause: Foreign object damage, blade erosion, tip rubs
- Effect: Reduced fan efficiency and bypass ratio
- Sensor signatures: Changed bypass ratio (BPR), altered fan speed characteristics
Why Two Fault Modes is Harder
With a single fault mode, degradation patterns are consistent:
With two fault modes, degradation can follow different paths:
FD001: The Baseline Case
FD001 is the simplest and most studied sub-dataset. Understanding it deeply provides the foundation for tackling more complex variants.
FD001 Characteristics
| Property | Value |
|---|---|
| Training engines | 100 |
| Test engines | 100 |
| Operating conditions | 1 (sea level) |
| Fault mode | 1 (HPC degradation only) |
| Total training cycles | 20,631 |
| Min trajectory length | 128 cycles |
| Max trajectory length | 362 cycles |
| Mean trajectory length | 206 cycles |
Why FD001 is "Easy"
- Unimodal sensor distributions: All engines operate at sea level, so sensor values cluster around single means
- Consistent degradation pattern: Only HPC fault, so degradation signatures are uniform
- Simple normalization: Global mean/std normalization works well
- Clean trends: Sensors show monotonic degradation without regime jumps
FD001 Baseline Performance
State-of-the-art methods on FD001 achieve RMSE around 11-12 cycles. Our AMNL model achieves RMSE ≈ 11.44, competitive with the best published results.
FD002, FD003, FD004: Increasing Complexity
FD002: Multiple Operating Conditions
| Property | Value |
|---|---|
| Training engines | 260 |
| Test engines | 259 |
| Operating conditions | 6 |
| Fault mode | 1 (HPC only) |
| Total training cycles | 53,759 |
Challenge: Same fault mode, but sensors now have 6 different "normal" baselines depending on operating regime.
Solution approach: Per-condition normalization to remove regime effects before feeding data to the model.
FD003: Multiple Fault Modes
| Property | Value |
|---|---|
| Training engines | 100 |
| Test engines | 100 |
| Operating conditions | 1 (sea level) |
| Fault modes | 2 (HPC + Fan) |
| Total training cycles | 24,720 |
Challenge: Single operating condition, but degradation can follow different patterns depending on which component fails first.
Solution approach: Model must learn multiple degradation signatures and potentially identify fault type implicitly.
FD004: Maximum Complexity
| Property | Value |
|---|---|
| Training engines | 249 |
| Test engines | 248 |
| Operating conditions | 6 |
| Fault modes | 2 (HPC + Fan) |
| Total training cycles | 61,249 |
Challenge: Everything is variable—operating conditions AND fault modes. This is closest to real-world conditions where engines operate across regimes and can fail in multiple ways.
Solution approach: Combine per-condition normalization with a powerful model that can learn multiple degradation patterns simultaneously.
FD004: The Real Test
FD004 is where many methods fail. A model that achieves 12 RMSE on FD001 might achieve 28+ RMSE on FD004. Our AMNL model achieves RMSE ≈ 19.34 on FD004—a 21% improvement over previous state-of-the-art.
Implications for Model Design
Understanding the dataset differences leads directly to design decisions in our AMNL model:
1. Per-Condition Normalization
For FD002 and FD004, we normalize each sensor within each operating condition separately:
Where and are computed only from data in operating condition .
2. Multi-Task Learning
With multiple fault modes, predicting both RUL (continuous) and health state (categorical) helps the model learn more robust features:
- Health state classification learns to distinguish degradation stages
- RUL regression learns fine-grained cycle predictions
- Shared features benefit from both signal types
3. Attention for Variable Patterns
With different fault modes, important timesteps vary between engines. Attention allows the model to focus on relevant degradation signatures regardless of fault type.
4. Dataset-Specific Evaluation
We must evaluate on all four datasets separately. A method that only works on FD001 is not useful for real-world deployment where conditions vary.
| Challenge | Source | Our Solution |
|---|---|---|
| Multimodal distributions | Multiple conditions | Per-condition normalization |
| Variable degradation | Multiple fault modes | Attention + multi-task learning |
| Different data sizes | Dataset variability | Same architecture, dataset-specific training |
| Performance gap | FD001 vs FD004 | Careful preprocessing + powerful model |
Summary
In this section, we explored the four C-MAPSS sub-datasets and their key differences:
- Complexity spectrum: FD001 (simplest) → FD004 (most complex)
- Operating conditions: 1 regime (FD001/FD003) vs 6 regimes (FD002/FD004) affecting all sensor readings
- Fault modes: HPC only (FD001/FD002) vs HPC + Fan (FD003/FD004) creating variable degradation patterns
- Multi-condition challenge: Same sensor value means different things in different regimes
- Multi-fault challenge: Same symptom can indicate different fault types
- Design implications: Per-condition normalization, attention, multi-task learning
| Dataset | Conditions | Faults | Key Challenge |
|---|---|---|---|
| FD001 | 1 | 1 | Baseline—learn HPC degradation |
| FD002 | 6 | 1 | Handle regime shifts in sensors |
| FD003 | 1 | 2 | Handle variable degradation patterns |
| FD004 | 6 | 2 | Handle both simultaneously |
Looking Ahead: Not all 21 sensors are equally informative. Some remain nearly constant; others are dominated by noise. In the next section, we will analyze each sensor and justify our selection of 17 informative features—the input to our model.
With the dataset complexity understood, we are ready to select the most predictive features.