AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand FD002's complexity with 6 operating conditions
Analyze the breakthrough +37.0% improvement over DKAMFormer
Examine exceptional consistency across all 5 seeds
Understand why multi-condition learning benefits from AMNL
Interpret statistical significance (p < 0.0001)

Key Result: On FD002, AMNL achieves 6.74 ± 0.91 RMSE—a +37.0% improvement over DKAMFormer (10.70) and +65.9% improvement over published SOTA (19.77). This is the most consistent result across seeds, with all 5 seeds beating both baselines. Statistical significance: p < 0.0001.

FD002 Dataset Characteristics

FD002 is a complex dataset with 6 distinct operating conditions, presenting significant challenges for RUL prediction.

Dataset Configuration

Property	Value	Implication
Operating Conditions	6 (Various altitudes/speeds)	High condition variability
Fault Modes	1 (HPC Degradation)	Single failure pattern
Training Engines	260	Large training set
Test Engines	259	Comprehensive evaluation
Total Training Cycles	~53,000	Substantial data volume
Condition Distribution	Variable per engine	Condition shifts during operation

Operating Condition Details

The 6 operating conditions span different combinations of altitude, Mach number, and throttle resolver angle:

Condition	Altitude (ft)	Mach	TRA
1	0 (Sea Level)	0.00	100
2	10,000	0.25	100
3	20,000	0.70	100
4	25,000	0.62	60
5	35,000	0.84	100
6	42,000	0.84	100

Why FD002 is Challenging

Unlike FD001, engines in FD002 operate across multiple conditions. This creates condition-dependent sensor distributions—the same degradation level produces different sensor readings at different altitudes. Models must learncondition-invariant degradation features to predict RUL accurately.

Per-Seed Results

AMNL demonstrates exceptional consistency on FD002, with all 5 seeds significantly outperforming baselines.

Comprehensive Per-Seed Data

Seed	RMSE	MAE	R²	NASA Score	Epochs	vs DKAMFormer
42	6.29	4.04	0.910	314.5	244	+41.2%
123 ✓	6.19	3.95	0.912	333.5	219	+42.1%
456	6.52	4.45	0.903	352.1	236	+39.1%
789	6.33	3.97	0.908	360.5	183	+40.8%
1024	8.35	7.21	0.841	419.4	237	+22.0%

Statistical Summary

Statistic	RMSE	MAE	R²	NASA Score
Mean	6.74	4.73	0.895	356.0
Std Dev	0.91	1.35	0.030	40.8
Best	6.19	3.95	0.912	314.5
Worst	8.35	7.21	0.841	419.4

Exceptional Consistency

The standard deviation of 0.91 RMSE represents only 13.5% coefficient of variation—much lower than FD001's 18.6%. Even the worst seed (1024 with 8.35 RMSE) significantly beats DKAMFormer (10.70).

All Seeds Beat Both Baselines

Seed	RMSE	vs DKAMFormer (10.70)	vs SOTA (19.77)
42	6.29	✓ +41.2%	✓ +68.2%
123	6.19	✓ +42.1%	✓ +68.7%
456	6.52	✓ +39.1%	✓ +67.0%
789	6.33	✓ +40.8%	✓ +68.0%
1024	8.35	✓ +22.0%	✓ +57.8%

Breakthrough Analysis

Understanding the magnitude of AMNL's improvement on FD002.

Statistical Significance

Highly Significant: p < 0.0001

The improvement is highly statistically significant. With p < 0.0001, there is less than a 0.01% chance this result occurred by random chance. This is the strongest statistical evidence in our evaluation.

Statistical Measure	Value	Interpretation
p-value	< 0.0001	Highly significant (****)
Effect Size (Cohen's d)	4.37	Very large effect
95% CI Lower	5.61	Lower bound of mean RMSE
95% CI Upper	7.86	Upper bound of mean RMSE

NASA Score Improvement

Unlike FD001, AMNL improves both RMSE and NASA Score on FD002:

Metric	AMNL	DKAMFormer	Improvement
RMSE	6.74	10.70	+37.0% ✓
NASA Score	356.0	498.0	+28.5% ✓

Dual Improvement: On FD002, AMNL achieves better RMSE and lower NASA Score simultaneously. This indicates the model not only predicts more accurately but also makes fewer dangerous late predictions.

Why AMNL Excels on FD002

Understanding the mechanisms behind AMNL's breakthrough performance on multi-condition data.

The Multi-Condition Challenge

Traditional RUL models struggle with FD002 because sensor readings depend on both degradation level and operating condition:

\text{Sensor}_i(t) = f(\text{Degradation}(t)) + g(\text{Condition}(t)) + \epsilon

The challenge: extracting $f(\text{Degradation})$ when it's confounded with $g(\text{Condition})$ .

How AMNL Solves This

Dual-task learning: The health classification task provides an auxiliary signal that guides feature learning
Equal weighting (0.5/0.5): Balanced gradients prevent either task from dominating the representation
Shared encoder: The CNN-BiLSTM-Attention backbone must learn features useful for both tasks
Condition-invariant representations: Health states (Healthy/Degrading/Critical) are defined by RUL, not conditions— learning to classify health requires learning condition-invariant features

The Regularization Effect

FD002 vs FD001 Comparison

Aspect	FD001 (Simple)	FD002 (Complex)
Operating Conditions	1	6
AMNL Improvement	+2.3%	+37.0%
Multi-task Benefit	Limited	Substantial
Condition-invariance Needed	No	Yes
Health Task Value	Moderate regularization	Essential for generalization

The Negative Transfer Gap

Conventional wisdom suggests multi-task learning often suffers from "negative transfer"—where auxiliary tasks hurt primary task performance. AMNL shows the opposite: multi-task learning helps more as complexity increases.

Summary

FD002 Results Summary:

Mean RMSE: 6.74 ± 0.91 (highly consistent)
Improvement: +37.0% vs DKAMFormer, +65.9% vs SOTA
Best single result: 6.19 RMSE (seed 123)
Statistical significance: p < 0.0001 (highly significant)
All seeds beat baselines: 5/5 seeds outperform DKAMFormer and SOTA

Key Metric	FD002 Result	Significance
RMSE Improvement	+37.0%	Largest vs DKAMFormer
Consistency (CV)	13.5%	Lower than FD001
NASA Score	356.0 (vs 498.0)	Also improved
Effect Size	4.37	Very large

Key Insight: FD002 represents AMNL's breakthrough dataset. The +37% improvement over DKAMFormer demonstrates that equal task weighting enables superior learning of condition-invariant features. This pattern continues with FD004 (also 6 conditions), where we see similar gains.

FD002 shows AMNL's strength on multi-condition data. Next, we examine FD003 (2 fault modes).