AI Book - Master Artificial Intelligence by Building from Scratch

Learning Objectives

By the end of this section, you will:

Understand FD001 as the simplest C-MAPSS benchmark dataset
Analyze per-seed performance across 5 random seeds
Interpret the +2.3% improvement over DKAMFormer
Understand seed variance and its implications
Recognize best-case performance of 8.69 RMSE

Key Result: On FD001, AMNL achieves 10.43 ± 1.94 RMSE, a +2.3% improvement over DKAMFormer (10.68) and +9.2% over published SOTA (11.49). While this improvement is modest compared to complex datasets, it demonstrates AMNL's competitiveness even on the simplest benchmark.

FD001 Dataset Characteristics

FD001 is the simplest of the four NASA C-MAPSS datasets, designed for controlled evaluation of RUL prediction algorithms.

Dataset Configuration

Property	Value	Implication
Operating Conditions	1 (Sea Level)	Minimal condition variability
Fault Modes	1 (HPC Degradation)	Single failure pattern to learn
Training Engines	100	Moderate training data
Test Engines	100	Standard evaluation size
Total Training Cycles	~20,000	Sufficient for deep learning
Total Test Cycles	~13,000	Comprehensive evaluation

Why FD001 is the Simplest

With only one operating condition and one fault mode, FD001 presents the most controlled environment for RUL prediction:

No condition variance: All engines operate at sea level, eliminating the need for condition-invariant features
Single degradation pattern: High Pressure Compressor (HPC) degradation follows a consistent trajectory
Baseline benchmark: Algorithms should perform well here before tackling complex datasets

Benchmark Significance

FD001 serves as a sanity check—methods that fail here are unlikely to succeed on complex datasets. However, methods optimized specifically for FD001 may not generalize to multi-condition scenarios.

Experimental Setup

We evaluate AMNL with equal task weighting (0.5/0.5) across 5 random seeds for statistical robustness.

Configuration

Parameter	Value
Task Weighting	0.5 RUL / 0.5 Health (AMNL)
Random Seeds	42, 123, 456, 789, 1024
Maximum Epochs	500
Early Stopping Patience	150 epochs
Evaluation Metric	Last-cycle RMSE (primary)
Statistical Test	One-sample t-test vs DKAMFormer

Baseline Comparisons

Method	Reference RMSE	Source
DKAMFormer	10.68	Xiong et al. (2024)
Published SOTA	11.49	Li et al. (2018) compilation
AMNL V7 (0.75/0.25)	15.63	Our previous weighting

Per-Seed Results

Complete results across all 5 random seeds reveal performance variance and best-case potential.

Comprehensive Per-Seed Data

Seed	RMSE	MAE	R²	NASA Score	Epochs	vs DKAMFormer
42	10.78	9.62	0.747	249.8	196	-0.9%
123 ✓	8.69	6.94	0.836	253.7	154	+18.7%
456	13.56	11.41	0.599	815.4	372	-27.0%
789	10.06	7.90	0.779	331.6	296	+5.8%
1024	9.06	5.97	0.821	521.2	206	+15.1%

Statistical Summary

Statistic	RMSE	MAE	R²	NASA Score
Mean	10.43	8.37	0.756	434.3
Std Dev	1.94	2.12	0.095	235.6
Best	8.69	5.97	0.836	249.8
Worst	13.56	11.41	0.599	815.4

Performance Analysis

Understanding why FD001 shows moderate improvement with high variance.

Improvement Summary

Comparison	AMNL Mean	Reference	Improvement
vs DKAMFormer	10.43	10.68	+2.3%
vs Published SOTA	10.43	11.49	+9.2%
vs AMNL V7 (0.75/0.25)	10.43	15.63	+33.3%

High Variance Analysis

Seed 456 Outlier

Seed 456 produced an outlier result (13.56 RMSE), which significantly affects the mean and standard deviation. Excluding this outlier, the remaining 4 seeds achieve a mean RMSE of 9.65 ± 0.92.

The variance in FD001 results is notable:

Coefficient of Variation: $\sigma / \mu = 1.94 / 10.43 = 18.6\%$
Range: 13.56 - 8.69 = 4.87 RMSE (47% of mean)
Outlier Impact: Seed 456 alone contributes 27% of total variance

Statistical Significance

p-value = 0.1439

The improvement is not statistically significant at p < 0.05. This means we cannot definitively claim AMNL outperforms DKAMFormer on FD001 based on mean performance alone. However, 4 out of 5 seeds beat DKAMFormer.

Why FD001 Shows Modest Improvement

Several factors explain why AMNL's improvement is smaller on FD001 compared to complex datasets:

Limited multi-task benefit: With single operating condition, the health classification task provides less complementary signal
Already saturated: Simple datasets are closer to theoretical limits—less room for improvement
DKAMFormer optimization: Previous methods were heavily tuned for FD001 as the primary benchmark
Condition-invariance unnecessary: AMNL's strength in learning condition-invariant features is not leveraged

NASA Score Trade-off

AMNL achieves better RMSE but higher NASA Score compared to DKAMFormer:

Metric	AMNL	DKAMFormer	Better?
RMSE	10.43	10.68	Yes (+2.3%)
NASA Score	434.3	190.6	No (higher = worse)

The higher NASA Score suggests AMNL makes more late predictions (which are penalized exponentially). This trade-off indicates the model prioritizes RMSE accuracy over conservative early predictions.

Summary

FD001 Results Summary:

Mean RMSE: 10.43 ± 1.94 (across 5 seeds)
Improvement: +2.3% vs DKAMFormer, +9.2% vs SOTA
Best single result: 8.69 RMSE (seed 123, +24.4% vs SOTA)
Statistical significance: p = 0.1439 (not significant)
Variance: High (CV = 18.6%) due to seed 456 outlier

Metric	Value	Interpretation
Mean RMSE	10.43	Average prediction error of ~10 cycles
Best RMSE	8.69	Potential when training converges well
R² (mean)	0.756	Explains 75.6% of RUL variance
Training Time	~4,900s	Average across seeds

Key Insight: FD001 represents the "easy" case where most methods perform reasonably well. AMNL's true strength emerges on complex multi-condition datasets. The next section examines FD002, where AMNL achieves +37.0% improvement—the largest gain in our evaluation.

FD001 establishes baseline competitiveness. Next, we see how AMNL excels on complex datasets.