The Headline Numbers
Three chapters of method, four chapters of training pipeline, one section of results. AMNL achieves RMSE 6.74 on FD002 - the lowest in the entire C-MAPSS literature including 2024 single-task SOTA. On FD003 AMNL leads the multi-task family at 9.51. On FD001 and FD003 single-task SOTA (DMHA-ATCN, LSTM Auto-PW) still wins.
Real Paper Table I
From paper_ieee_tii/tables/table1_sota_comparison.md. 5-seed mean ± std on the held-out C-MAPSS test set. Bold = lowest RMSE per column; (S) marks single-task baselines.
| Method (year) | Type | FD001 | FD002 | FD003 | FD004 |
|---|---|---|---|---|---|
| DMHA-ATCN (2024) [S] | Attn+TCN | 7.74 | 16.95 | **7.18** | 17.76 |
| STAR (2024) [S] | Transformer | 10.61 | 13.47 | 10.71 | 15.87 |
| LSTM Auto-PW (2022) [S] | RNN | **7.78** | 17.04 | 8.03 | 17.63 |
| Baseline 0.5/0.5 (ours) | Fixed MTL | 10.08 ±1.71 | 7.37 ±0.43 | 11.53 ±2.81 | 8.76 ±1.38 |
| Uncertainty (Kendall et al.) | Adaptive MTL | 9.15 ±0.42 | 7.77 ±0.89 | 10.99 ±1.94 | 8.19 ±0.90 |
| GradNorm (Chen et al.) | Adaptive MTL | 9.38 ±1.53 | 8.19 ±0.78 | 12.38 ±1.76 | **7.74 ±0.59** |
| **AMNL (ours)** | Fixed MTL | 10.43 ±1.94 | **6.74 ±0.91** | **9.51 ±1.74** | 8.16 ±2.17 |
| **GABA (ours)** | Adaptive MTL | 9.63 ±1.90 | 7.53 ±0.65 | 11.96 ±2.28 | 8.25 ±1.10 |
| **GRACE (ours)** | Adaptive MTL | 9.14 ±1.39 | 7.72 ±0.66 | 13.12 ±1.77 | 8.12 ±0.70 |
Interactive: Bar-Chart Comparison
Toggle metric (RMSE / NASA score) and C-MAPSS subset. AMNL's green bar shrinks dramatically on FD002 / FD004 - the multi-condition subsets. Note that single-task papers rarely report NASA score, so those bars are blank when you switch to NASA.
Try this. Switch to FD002 with metric RMSE. The grey single-task bars (CNN, BiLSTM, AGCNN…) all sit above 13. The MTL baseline 0.5/0.5 already gets 7.37 - the multi-task signal helps. Then AMNL drops to 6.74. That is a 60% reduction vs DMHA-ATCN, the latest single-task SOTA. The MTL family wins decisively on this subset.
Where AMNL Wins (and Where It Doesn't)
Per-subset improvement vs DMHA-ATCN (single-task SOTA, 2024):
| Subset | Conditions × Faults | AMNL RMSE | DMHA-ATCN RMSE | AMNL relative |
|---|---|---|---|---|
| FD001 | 1 × 1 | 10.43 | 7.74 | −34.8% (WORSE) |
| FD002 | 6 × 1 | **6.74** | 16.95 | **+60.2% (BETTER)** |
| FD003 | 1 × 2 | 9.51 | 7.18 | −32.5% (WORSE) |
| FD004 | 6 × 2 | **8.16** | 17.76 | **+54.1% (BETTER)** |
Python: Read and Compare a Results Table
Pure-Python harness. Hard-coded RESULTS dict mirrors the real Table I. best_per_subset finds the per-subset winner; relative_improvementcomputes percent-better-or-worse against any baseline.
PyTorch: Aggregate Across Seeds
How the ± std numbers in Table I are computed. aggregate_seeds() takes the per-seed RMSE dicts and returns mean / sample-std per subset. The smoke test reproduces the paper's Table I AMNL row to within 0.5 cycles using synthetic 5-seed draws.
Patterns Across Public Benchmarks
Multi-task wins on multi-condition data is a pattern that repeats across PHM benchmarks. Table below from the paper's comparison appendix.
| Benchmark | Conditions | Best single-task | Best multi-task | Δ |
|---|---|---|---|---|
| C-MAPSS FD002 (this book) | 6 | DMHA-ATCN 16.95 | AMNL 6.74 | −60% |
| C-MAPSS FD004 (this book) | 6 | Neural ODE 15.06 | GradNorm 7.74 | −49% |
| N-CMAPSS DS02 (Arias-Chao 2021) | 11 | Transformer 9.42 | GRACE 6.35 | −33% |
| PRONOSTIA bearings (FEMTO 2012) | 3 | AGCNN 0.87 (relative) | MTL-RNN 0.71 (relative) | −18% |
| Battery cycling (Severson 2019) | 2 | CNN-LSTM 7.2 cycles | MTL+aux 5.4 cycles | −25% |
| Wind-turbine SCADA (NREL 2023) | 4 (seasons) | BiLSTM 1.4 days | MTL-Attn 0.9 days | −36% |
Three Result-Reporting Pitfalls
The point. AMNL achieves best-in-literature RMSE on FD002 (6.74, beating single-task SOTA by 60%). Best among MTL methods on FD003 (9.51, but DMHA-ATCN's 7.18 stays ahead overall). The wins concentrate where condition variability is high. §16.2 covers AMNL's NASA-score weakness on FD001; §16.3 covers the cross-pipeline caveat; §16.4 turns the patterns into a deployment recommendation.
Takeaway
- AMNL FD002 RMSE = 6.74. Lowest in the C-MAPSS literature, 60% below DMHA-ATCN.
- AMNL FD003 RMSE = 9.51. Best among MTL; DMHA-ATCN's 7.18 still leads single-task.
- FD001 / FD003 stay single-task. AMNL loses by 30-35% on these single-condition subsets.
- Pattern. MTL wins on multi-condition data (≥ 4 operating conditions). Single-task wins on single-condition data.
- Always 5-seed mean ± std. Single-seed RMSE is meaningless for AMNL given its 0.91-2.17 std on different subsets.