AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand FD004 as the most complex C-MAPSS dataset
Analyze the +36.7% breakthrough on challenging data
Examine the best single result of 6.17 RMSE across all experiments
Understand AMNL's robustness on complex scenarios
Interpret highly significant results (p = 0.0001)

Key Result: On FD004 (the most complex dataset), AMNL achieves 8.16 ± 2.17 RMSE—a +36.7% improvement over DKAMFormer (12.89) and +60.5% improvement over published SOTA (20.67). The best seed (123) achieves 6.17 RMSE—the best single result across all 20 experiments, representing a +70.2% improvement over SOTA.

FD004 Dataset Characteristics

FD004 combines all complexities: 6 operating conditions and 2 fault modes, making it the ultimate test of RUL prediction capability.

Dataset Configuration

Property	Value	Implication
Operating Conditions	6 (Various altitudes/speeds)	Maximum condition variability
Fault Modes	2 (HPC + Fan)	Multiple failure patterns
Training Engines	249	Largest training set
Test Engines	248	Comprehensive evaluation
Total Training Cycles	~61,000	Most data available
Complexity	Maximum	Combines all challenges

Why FD004 is the Ultimate Challenge

Combined Complexity

FD004 inherits the challenges of both FD002 (6 conditions) and FD003 (2 faults). Models must simultaneously:

Learn condition-invariant features (6 conditions)
Learn fault-agnostic degradation patterns (2 faults)
Handle larger variance in degradation trajectories

Complexity Comparison

Dataset	Conditions	Faults	Complexity Level	Previous SOTA RMSE
FD001	1	1	Simple	11.49
FD002	6	1	Complex	19.77
FD003	1	2	Moderate	11.71
FD004	6	2	Maximum	20.67

Historical Performance

FD004's published SOTA of 20.67 RMSE is the worst across all datasets, reflecting its difficulty. Many methods that excel on simpler datasets struggle significantly here.

Per-Seed Results

AMNL achieves strong performance across most seeds, with 4 out of 5 achieving sub-9 RMSE.

Comprehensive Per-Seed Data

Seed	RMSE	MAE	R²	NASA Score	Epochs	vs DKAMFormer
42	8.78	6.86	0.827	855.3	206	+31.9%
123 ✓✓	6.17	4.48	0.915	326.6	187	+52.1%
456	6.96	5.73	0.891	327.6	178	+46.0%
789	7.24	6.16	0.882	371.3	188	+43.8%
1024	11.65	10.71	0.696	806.5	282	+9.6%

Statistical Summary

Statistic	RMSE	MAE	R²	NASA Score
Mean	8.16	6.79	0.842	537.5
Std Dev	2.17	2.27	0.086	262.7
Best	6.17	4.48	0.915	326.6
Worst	11.65	10.71	0.696	855.3

Outstanding Seed Performance

Outcome	Seeds	RMSE Range
Excellent (< 8 RMSE)	123, 456, 789	6.17 - 7.24
Good (8-9 RMSE)	42	8.78
Moderate (> 9 RMSE)	1024	11.65

Remarkable Consistency on Complex Data

Four out of five seeds achieve sub-9 RMSE on the most complex dataset—a remarkable achievement. Even the worst seed (1024 at 11.65) still beats DKAMFormer (12.89) by 9.6%.

Breakthrough Analysis

FD004 demonstrates AMNL's exceptional capability on complex data.

Statistical Significance

Statistical Measure	Value	Interpretation
p-value	0.0001	Highly significant (****)
Effect Size (Cohen's d)	2.18	Very large effect
95% CI Lower	5.47	Lower bound of mean RMSE
95% CI Upper	10.85	Upper bound of mean RMSE

Highly Significant: p = 0.0001

The result is highly statistically significant. There is only a 0.01% chance this improvement occurred by random chance. Combined with the large effect size (2.18), this provides strong evidence for AMNL's superiority.

NASA Score Improvement

Like FD002, AMNL improves both RMSE and NASA Score on FD004:

Metric	AMNL	DKAMFormer	Improvement
RMSE	8.16	12.89	+36.7% ✓
NASA Score	537.5	945.0	+43.1% ✓

Dual Improvement on Complex Data: On both 6-condition datasets (FD002 and FD004), AMNL achieves better RMSE and better NASA Score. This suggests the model learns truly condition-invariant features that improve all aspects of prediction.

Best Overall Result: 6.17 RMSE

Seed 123 on FD004 achieved the best single result across all 20 experiments (4 datasets × 5 seeds).

Detailed Analysis of Best Result

Metric	Seed 123 on FD004	Comparison
RMSE	6.17	Best across all experiments
MAE	4.48	Predictions off by ~4.5 cycles
R²	0.915	Explains 91.5% of variance
NASA Score	326.6	Better than DKAMFormer (945.0)
Epochs to Best	187	Efficient convergence
Training Time	4,751 seconds	~79 minutes

Why FD004 Shows Large Improvement

FD004's complexity amplifies AMNL's advantages:

Maximum condition-invariance benefit: 6 conditions provide the strongest signal for the health task to regularize learning
Larger training set: 249 engines (vs 100 for FD001) enable better representation learning
Combined challenges favor dual-task: Multiple faults + conditions make single-task learning harder, amplifying multi-task benefits
Previous methods struggled: With SOTA at 20.67, there's more room for improvement

Cross-Dataset Comparison: 6-Condition Results

Metric	FD002 (6 cond, 1 fault)	FD004 (6 cond, 2 faults)
Mean RMSE	6.74	8.16
vs DKAMFormer	+37.0%	+36.7%
Best Seed	6.19	6.17
p-value	< 0.0001	0.0001
NASA Score Improved?	Yes	Yes

Consistent Pattern

Both 6-condition datasets show ~37% improvement, nearly identical NASA Score improvements, and best seeds achieving ~6.2 RMSE. This remarkable consistency confirms AMNL's robustness for multi-condition scenarios.

Summary

FD004 Results Summary:

Mean RMSE: 8.16 ± 2.17 (across 5 seeds)
Improvement: +36.7% vs DKAMFormer, +60.5% vs SOTA
Best single result: 6.17 RMSE (seed 123)—best across all experiments
Statistical significance: p = 0.0001 (highly significant)
NASA Score: Also improved (537.5 vs 945.0)

Key Achievement	Value	Significance
Best Overall RMSE	6.17	70.2% better than SOTA
RMSE Improvement	+36.7%	Matches FD002 pattern
NASA Score	↓43.1%	Fewer dangerous predictions
R² (best seed)	0.915	Explains 91.5% of variance

Conclusion: FD004 confirms AMNL's breakthrough performance on complex multi-condition data. The +36.7% improvement matches FD002, demonstrating that equal task weighting consistently enables superior learning of condition-invariant, fault-agnostic features. The best result of 6.17 RMSE—a 70.2% improvement over SOTA—establishes a new benchmark for the field.

With all four datasets analyzed, we now compare AMNL against 15+ published methods.