Learning Objectives
By the end of this section, you will:
- Understand the C-MAPSS simulation and why it became the standard benchmark for RUL prediction
- Know the four sub-datasets (FD001-FD004) and their complexity characteristics
- Learn the data format: operational settings, sensor measurements, and their physical meanings
- Understand feature selection: which of the 21 sensors provide useful information
- Master the evaluation protocol: train/test splits and ground-truth RUL files
- See the performance benchmark: where AMNL stands relative to state-of-the-art methods
Why This Matters: The NASA C-MAPSS dataset is the de facto benchmark for evaluating predictive maintenance algorithms. Understanding its structure, challenges, and evaluation protocol is essential for interpreting research results and designing your own experiments.
What is C-MAPSS?
C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) is a sophisticated simulation tool developed by NASA for modeling turbofan engine degradation. The dataset generated from this simulator has become the gold standard for benchmarking RUL prediction algorithms.
The Turbofan Engine
A turbofan engine is the most common type of jet engine used in commercial aircraft. It consists of several key components:
- Fan: Large front fan that draws in air and provides most of the thrust
- Low-Pressure Compressor (LPC): Compresses air before the core
- High-Pressure Compressor (HPC): Further compresses air to high pressure
- Combustor: Burns fuel with compressed air
- High-Pressure Turbine (HPT): Extracts energy to drive the HPC
- Low-Pressure Turbine (LPT): Extracts energy to drive the fan and LPC
Simulated Degradation
The C-MAPSS simulation injects fault modes that cause progressive degradation over time:
| Fault Mode | Affected Components | Effect |
|---|---|---|
| HPC Degradation | High-Pressure Compressor | Reduced efficiency, increased temperature |
| Fan Degradation | Fan assembly | Reduced thrust, vibration increase |
These faults develop gradually over hundreds of operational cycles, mimicking real-world wear patterns. The simulator records sensor measurements throughout the degradation process until a failure threshold is reached.
Why Simulation Data?
The Four Sub-Datasets
C-MAPSS provides four sub-datasets with varying levels of complexity, enabling systematic evaluation of algorithm robustness:
| Dataset | Operating Conditions | Fault Modes | Train Units | Test Units | Difficulty |
|---|---|---|---|---|---|
| FD001 | 1 | 1 (HPC) | 100 | 100 | Easy |
| FD002 | 6 | 1 (HPC) | 260 | 259 | Hard |
| FD003 | 1 | 2 (HPC + Fan) | 100 | 100 | Medium |
| FD004 | 6 | 2 (HPC + Fan) | 249 | 248 | Very Hard |
Understanding Dataset Complexity
Operating Conditions
Operating conditions represent different flight regimes—combinations of altitude, Mach number, and throttle settings. Each condition produces different baseline sensor readings:
- FD001/FD003: Single operating condition—all engines operate under identical flight regime
- FD002/FD004: Six operating conditions—engines experience varying altitudes, speeds, and power settings
The Multi-Condition Challenge
Fault Modes
- FD001/FD002: Single fault mode (HPC degradation)—all engines fail the same way
- FD003/FD004: Two fault modes (HPC and Fan degradation)—engines can fail in different ways with different signatures
Data Format and Features
Each data file is a space-separated text file with one row per engine-cycle observation:
Column Structure
| Column | Name | Description |
|---|---|---|
| 1 | unit | Engine unit ID (integer) |
| 2 | cycle | Operational cycle number (integer) |
| 3 | setting_1 | Altitude (operational setting) |
| 4 | setting_2 | Mach number (operational setting) |
| 5 | setting_3 | Throttle resolver angle (operational setting) |
| 6-26 | sensor_1 to sensor_21 | 21 sensor measurements |
Sample Data Row
1unit cycle set1 set2 set3 s1 s2 s3 ... s21
21 1 -0.0007 -0.0004 100.0 518.67 641.82 1589.70 ... 8138.62
31 2 0.0019 -0.0003 100.0 518.67 642.15 1591.82 ... 8131.49
41 3 -0.0043 0.0003 100.0 518.67 642.35 1587.99 ... 8133.23
5...
61 192 -0.0019 -0.0002 100.0 518.67 641.71 1588.45 ... 8129.23
72 1 0.0007 0.0000 100.0 518.67 642.42 1592.14 ... 8132.45Sensor Descriptions
The 21 sensors measure various physical quantities throughout the engine:
| Sensor | Description | Unit | Typical Range |
|---|---|---|---|
| sensor_1 | Total temperature at fan inlet | °R | ~518 |
| sensor_2 | Total temperature at LPC outlet | °R | ~642 |
| sensor_3 | Total temperature at HPC outlet | °R | ~1580-1592 |
| sensor_4 | Total temperature at LPT outlet | °R | ~1400-1406 |
| sensor_5 | Pressure at fan inlet | psia | ~14.6 |
| sensor_6 | Physical fan speed | rpm | ~2388 |
| sensor_7 | Physical core speed | rpm | ~9046-9054 |
| sensor_8 | Engine pressure ratio | - | ~1.30 |
| sensor_9 | Static pressure at HPC outlet | psia | ~554-558 |
| sensor_10 | Fuel flow/Ps30 ratio | - | ~394-402 |
| sensor_11 | Corrected fan speed | rpm | ~2388 |
| sensor_12 | Corrected core speed | rpm | ~9046-9057 |
| sensor_13 | HPT coolant bleed | - | ~47.3-47.5 |
| sensor_14 | LPT coolant bleed | - | ~522-523 |
| sensor_15 | Fuel flow rate | pph | ~2388 |
| sensor_16 | Burner fuel-air ratio | - | ~0.0211 |
| sensor_17 | Total pressure in bypass-duct | psia | ~8.40-8.46 |
| sensor_18 | Demanded fan speed | rpm | ~2388 |
| sensor_19 | Demanded corrected fan speed | rpm | ~100 |
| sensor_20 | HPC outlet temperature | °R | ~8128-8146 |
| sensor_21 | Fan tip clearance | in | ~14.95 |
Not All Sensors Are Informative
Feature Selection
Following established practice in the literature, we select 14 informative sensors plus the 3 operational settings, yielding 17 features total:
Selected Features
| Category | Features | Rationale |
|---|---|---|
| Operational Settings | setting_1, setting_2, setting_3 | Define operating condition for normalization |
| Temperature Sensors | sensor_2, sensor_3, sensor_4, sensor_20 | Temperature changes indicate degradation |
| Speed Sensors | sensor_6, sensor_7, sensor_11, sensor_12 | Speed variations indicate efficiency loss |
| Pressure Sensors | sensor_8, sensor_9, sensor_17 | Pressure ratios reflect compressor health |
| Other | sensor_13, sensor_14 | Bleed air indicates thermal stress |
Excluded Sensors
Seven sensors are excluded due to constant or near-constant values:
- sensor_1: Total temperature at fan inlet—constant at 518.67°R
- sensor_5: Pressure at fan inlet—constant at ~14.6 psia
- sensor_10: Fuel flow ratio—constant
- sensor_15: Fuel flow rate—redundant with other sensors
- sensor_16: Burner fuel-air ratio—constant
- sensor_18, sensor_19: Demanded fan speeds—constant
Implementation Detail
EnhancedNASACMAPSSDataset class with the enforce_feature_set=True parameter, which automatically selects the 17 informative features.Train-Test Split Protocol
The C-MAPSS benchmark uses a specific evaluation protocol that differs from typical machine learning train-test splits:
Training Data
- Engines run from initial operation until failure
- Complete degradation trajectories available
- RUL can be calculated as:
- Files:
train_FD001.txt,train_FD002.txt, etc.
Test Data
- Engines run for some time but do NOT reach failure
- Trajectories are truncated at random points
- Ground-truth RUL provided in separate files
- Files:
test_FD001.txt+RUL_FD001.txt
File Structure
1data/raw/
2├── train_FD001.txt # Training trajectories (100 engines)
3├── test_FD001.txt # Test trajectories (100 engines, truncated)
4├── RUL_FD001.txt # Ground truth RUL for test set (100 values)
5├── train_FD002.txt # Training (260 engines)
6├── test_FD002.txt # Test (259 engines, truncated)
7├── RUL_FD002.txt # Ground truth (259 values)
8├── train_FD003.txt # Training (100 engines)
9├── test_FD003.txt # Test (100 engines, truncated)
10├── RUL_FD003.txt # Ground truth (100 values)
11├── train_FD004.txt # Training (249 engines)
12├── test_FD004.txt # Test (248 engines, truncated)
13└── RUL_FD004.txt # Ground truth (248 values)Evaluation Protocol
The standard evaluation protocol computes metrics on the last cycle of each test engine:
- For each test engine, extract all cycles
- Run the model on each cycle (or sliding windows)
- Compare the prediction at the last cycle to the ground-truth RUL
- Compute RMSE and NASA Score across all test engines
Why Last-Cycle Evaluation?
RUL_FDxxx.txt files provide the RUL at the final cycle of each test trajectory. This simulates the real-world scenario: "Given sensor data up to now, how much life remains?" The model must predict accurately at the truncation point.Why C-MAPSS Matters for Research
C-MAPSS has become the standard benchmark for several important reasons:
1. Controlled Complexity Progression
The four sub-datasets provide a natural difficulty progression:
- Test on FD001 first to verify basic algorithm correctness
- Move to FD003 to test multi-modal learning
- Challenge with FD002 for multi-condition robustness
- Ultimate test with FD004 for full complexity
2. Known Ground Truth
Unlike real-world data where failure time is often uncertain, C-MAPSS provides exact failure times, enabling precise evaluation of prediction accuracy.
3. Extensive Benchmarking
Hundreds of papers have reported results on C-MAPSS, enabling direct comparison of methods. Published RMSE values range from ~11 (SOTA transformer methods) to ~20+ (classical approaches).
4. Realistic Challenges
C-MAPSS captures key real-world difficulties:
- Multiple operating conditions (flight regimes)
- Multiple failure modes
- Noisy sensor measurements
- Varying trajectory lengths
- Class imbalance (most data is healthy operation)
AMNL Performance on C-MAPSS
Our AMNL model achieves state-of-the-art performance on 3 of 4 datasets with statistical significance, including dramatic improvements on the challenging multi-condition datasets:
Main Results (RMSE, 5-seed average)
| Dataset | AMNL (Ours) | DKAMFormer | Previous SOTA | Improvement |
|---|---|---|---|---|
| FD001 | 10.43 ± 1.94 | 10.68 | 11.49 | +9.2% |
| FD002 | 6.74 ± 0.91 | 10.70 | 19.77 | +65.9% |
| FD003 | 9.51 ± 1.74 | 10.52 | 11.71 | +18.8% |
| FD004 | 8.16 ± 2.17 | 12.89 | 20.67 | +60.5% |
Best Individual Results
| Dataset | Best RMSE | Seed | vs SOTA |
|---|---|---|---|
| FD001 | 8.69 | 123 | +24.4% |
| FD002 | 6.19 | 123 | +68.7% |
| FD003 | 8.05 | 42 | +31.2% |
| FD004 | 6.17 | 123 | +70.2% |
Key Finding: AMNL achieves 65.9% improvement on FD002 and 60.5% improvement on FD004—the complex multi-condition datasets where previous methods struggled. This demonstrates exceptional generalization across diverse operating conditions.
Statistical Significance
| Dataset | p-value | Significance |
|---|---|---|
| FD001 | 0.1439 | Not significant |
| FD002 | <0.0001 | Highly significant (p<0.001) |
| FD003 | 0.0234 | Significant (p<0.05) |
| FD004 | 0.0001 | Highly significant (p<0.001) |
The improvements on FD002, FD003, and FD004 are statistically significant, meaning they are unlikely to be due to random chance.
Why AMNL Excels on Multi-Condition Data
The dramatic improvements on FD002 and FD004 are directly attributable to the equal weighting in AMNL:
- The health classification task forces the model to learn condition-invariant representations
- Health states (Normal, Early Degradation, Critical) are defined by RUL thresholds, not operating conditions
- Equal task weighting (0.5/0.5) ensures the model cannot "cheat" by ignoring the classification objective
- This regularization effect is strongest when conditions are heterogeneous—exactly the FD002/FD004 setting
Summary
In this section, we have covered the NASA C-MAPSS benchmark dataset:
- C-MAPSS simulates turbofan engine degradation with realistic sensor noise and operating condition variability
- Four sub-datasets provide controlled complexity: FD001 (easy), FD002 (hard), FD003 (medium), FD004 (very hard)
- Multi-condition datasets (FD002, FD004) are particularly challenging due to 6 different operating regimes
- 17 features are selected from the original 24 (3 settings + 14 informative sensors)
- Evaluation protocol uses last-cycle predictions compared against ground-truth RUL files
- AMNL achieves SOTA with 65.9% improvement on FD002 and 60.5% on FD004
| Dataset Statistics | FD001 | FD002 | FD003 | FD004 |
|---|---|---|---|---|
| Operating Conditions | 1 | 6 | 1 | 6 |
| Fault Modes | 1 | 1 | 2 | 2 |
| Train Engines | 100 | 260 | 100 | 249 |
| Test Engines | 100 | 259 | 100 | 248 |
| AMNL RMSE | 10.43 | 6.74 | 9.51 | 8.16 |
| Improvement vs SOTA | +9.2% | +65.9% | +18.8% | +60.5% |
Looking Ahead: In the next section, we will survey existing state-of-the-art methods for RUL prediction and understand their limitations—setting the stage for why AMNL's novel approach was necessary.
With a solid understanding of the benchmark dataset, we are ready to explore what existing methods have achieved and where they fall short.