Chapter 16
15 min read
Section 81 of 104

FD004 Results: +36.7% Improvement

Main Results: State-of-the-Art

Learning Objectives

By the end of this section, you will:

  1. Understand FD004 as the most complex C-MAPSS dataset
  2. Analyze the +36.7% breakthrough on challenging data
  3. Examine the best single result of 6.17 RMSE across all experiments
  4. Understand AMNL's robustness on complex scenarios
  5. Interpret highly significant results (p = 0.0001)
Key Result: On FD004 (the most complex dataset), AMNL achieves 8.16 ± 2.17 RMSE—a +36.7% improvement over DKAMFormer (12.89) and +60.5% improvement over published SOTA (20.67). The best seed (123) achieves 6.17 RMSE—the best single result across all 20 experiments, representing a +70.2% improvement over SOTA.

FD004 Dataset Characteristics

FD004 combines all complexities: 6 operating conditions and 2 fault modes, making it the ultimate test of RUL prediction capability.

Dataset Configuration

PropertyValueImplication
Operating Conditions6 (Various altitudes/speeds)Maximum condition variability
Fault Modes2 (HPC + Fan)Multiple failure patterns
Training Engines249Largest training set
Test Engines248Comprehensive evaluation
Total Training Cycles~61,000Most data available
ComplexityMaximumCombines all challenges

Why FD004 is the Ultimate Challenge

Combined Complexity

FD004 inherits the challenges of both FD002 (6 conditions) and FD003 (2 faults). Models must simultaneously:

  • Learn condition-invariant features (6 conditions)
  • Learn fault-agnostic degradation patterns (2 faults)
  • Handle larger variance in degradation trajectories

Complexity Comparison

DatasetConditionsFaultsComplexity LevelPrevious SOTA RMSE
FD00111Simple11.49
FD00261Complex19.77
FD00312Moderate11.71
FD00462Maximum20.67

Historical Performance

FD004's published SOTA of 20.67 RMSE is the worst across all datasets, reflecting its difficulty. Many methods that excel on simpler datasets struggle significantly here.


Per-Seed Results

AMNL achieves strong performance across most seeds, with 4 out of 5 achieving sub-9 RMSE.

Comprehensive Per-Seed Data

SeedRMSEMAENASA ScoreEpochsvs DKAMFormer
428.786.860.827855.3206+31.9%
123 ✓✓6.174.480.915326.6187+52.1%
4566.965.730.891327.6178+46.0%
7897.246.160.882371.3188+43.8%
102411.6510.710.696806.5282+9.6%

Statistical Summary

StatisticRMSEMAENASA Score
Mean8.166.790.842537.5
Std Dev2.172.270.086262.7
Best6.174.480.915326.6
Worst11.6510.710.696855.3

Outstanding Seed Performance

OutcomeSeedsRMSE Range
Excellent (< 8 RMSE)123, 456, 7896.17 - 7.24
Good (8-9 RMSE)428.78
Moderate (> 9 RMSE)102411.65

Remarkable Consistency on Complex Data

Four out of five seeds achieve sub-9 RMSE on the most complex dataset—a remarkable achievement. Even the worst seed (1024 at 11.65) still beats DKAMFormer (12.89) by 9.6%.


Breakthrough Analysis

FD004 demonstrates AMNL's exceptional capability on complex data.

Statistical Significance

Statistical MeasureValueInterpretation
p-value0.0001Highly significant (****)
Effect Size (Cohen's d)2.18Very large effect
95% CI Lower5.47Lower bound of mean RMSE
95% CI Upper10.85Upper bound of mean RMSE

Highly Significant: p = 0.0001

The result is highly statistically significant. There is only a 0.01% chance this improvement occurred by random chance. Combined with the large effect size (2.18), this provides strong evidence for AMNL's superiority.

NASA Score Improvement

Like FD002, AMNL improves both RMSE and NASA Score on FD004:

MetricAMNLDKAMFormerImprovement
RMSE8.1612.89+36.7% ✓
NASA Score537.5945.0+43.1% ✓
Dual Improvement on Complex Data: On both 6-condition datasets (FD002 and FD004), AMNL achieves better RMSE and better NASA Score. This suggests the model learns truly condition-invariant features that improve all aspects of prediction.

Best Overall Result: 6.17 RMSE

Seed 123 on FD004 achieved the best single result across all 20 experiments (4 datasets × 5 seeds).

Detailed Analysis of Best Result

MetricSeed 123 on FD004Comparison
RMSE6.17Best across all experiments
MAE4.48Predictions off by ~4.5 cycles
0.915Explains 91.5% of variance
NASA Score326.6Better than DKAMFormer (945.0)
Epochs to Best187Efficient convergence
Training Time4,751 seconds~79 minutes

Why FD004 Shows Large Improvement

FD004's complexity amplifies AMNL's advantages:

  1. Maximum condition-invariance benefit: 6 conditions provide the strongest signal for the health task to regularize learning
  2. Larger training set: 249 engines (vs 100 for FD001) enable better representation learning
  3. Combined challenges favor dual-task: Multiple faults + conditions make single-task learning harder, amplifying multi-task benefits
  4. Previous methods struggled: With SOTA at 20.67, there's more room for improvement

Cross-Dataset Comparison: 6-Condition Results

MetricFD002 (6 cond, 1 fault)FD004 (6 cond, 2 faults)
Mean RMSE6.748.16
vs DKAMFormer+37.0%+36.7%
Best Seed6.196.17
p-value< 0.00010.0001
NASA Score Improved?YesYes

Consistent Pattern

Both 6-condition datasets show ~37% improvement, nearly identical NASA Score improvements, and best seeds achieving ~6.2 RMSE. This remarkable consistency confirms AMNL's robustness for multi-condition scenarios.


Summary

FD004 Results Summary:

  1. Mean RMSE: 8.16 ± 2.17 (across 5 seeds)
  2. Improvement: +36.7% vs DKAMFormer, +60.5% vs SOTA
  3. Best single result: 6.17 RMSE (seed 123)—best across all experiments
  4. Statistical significance: p = 0.0001 (highly significant)
  5. NASA Score: Also improved (537.5 vs 945.0)
Key AchievementValueSignificance
Best Overall RMSE6.1770.2% better than SOTA
RMSE Improvement+36.7%Matches FD002 pattern
NASA Score↓43.1%Fewer dangerous predictions
R² (best seed)0.915Explains 91.5% of variance
Conclusion: FD004 confirms AMNL's breakthrough performance on complex multi-condition data. The +36.7% improvement matches FD002, demonstrating that equal task weighting consistently enables superior learning of condition-invariant, fault-agnostic features. The best result of 6.17 RMSE—a 70.2% improvement over SOTA—establishes a new benchmark for the field.

With all four datasets analyzed, we now compare AMNL against 15+ published methods.