Boo-AI — Master Artificial Intelligence by Building from Scratch

Track Days vs. Race Days

On a track day, drivers run flat-out laps with engines at fixed boost, single tyre compound, fresh fluids. Lap times are fast, repeatable, easy to compare. On race day everything changes: traffic, weather, fuel weight that drops as the race goes on, tyre wear, pit-stop strategy. The ranking of cars on the same track on race day rarely matches the track-day ranking. The race tests something the track day cannot.

C-MAPSS is a track day. Engines run in steady-state operating regimes; sensor noise is mild; trajectories are short. N-CMAPSS DS02 is a race day. Real flight envelopes — takeoff, climb, cruise, descent, landing — with sensors logged at 0.1-second resolution; 5 million rows in DS02's development split alone; engines running through full operational cycles with realistic component degradation. Methods that win on C-MAPSS sometimes lose on N-CMAPSS, and the reverse.

On C-MAPSS GRACE wins NASA on the multi-condition mean and sits on the Pareto front (chapter 23·1, 23·2). On N-CMAPSS DS02 GRACE wins both RMSE and NASA — and is the ONLY Pareto-optimal method out of the 9 MTL variants studied. It also beats the published DKAMFormer (Fu et al. 2025) by 0.055 RMSE. This section walks the result and explains the mechanism.

The headline. N-CMAPSS DS02 (5-seed mean, ±1σ): GRACE RMSE =

6.345 \pm 0.366

, NASA = 8560 ± 908. Every other method has a higher mean on at least one axis. DKAMFormer's published RMSE is 6.40; GRACE's mean is below that.

What N-CMAPSS DS02 Is

N-CMAPSS (New C-MAPSS, Arias Chao et al. 2021) is NASA's successor to the original C-MAPSS dataset. It targets the criticism that C-MAPSS used unrealistic operating conditions and short trajectories. Key differences:

Property	C-MAPSS	N-CMAPSS DS02
Operating model	6 discrete steady-state regimes	Continuous flight envelope (alt, Mach, throttle)
Time resolution	1 cycle = many flight hours	1 sample = 0.1 second
Trajectory length	~150-300 cycles per unit	Full flight cycle, ~25-90 flights per unit
Train units	100-260 (per FD subset)	7 dev units (DS02)
Test units	100-259 (per FD subset)	2 test units (DS02)
Total dev rows	~17k-20k windows	~5,000,000 rows (~50k after 100× subsample)
Sensor count (DKAMFormer protocol)	14 selected	20 selected (3 scenario + 17 sensor)
Failure modes (DS02)	1-2 fault modes per FD subset	2 (HPC + LPC degradation)
Realism	Steady-state only	Real flight transients

The realistic operating profile makes N-CMAPSS DS02 the prognostics community's preferred ‘hard’ benchmark. Methods that exploit the steady-state structure of C-MAPSS (per-condition normalisation, simple sliding-window regressors) lose ground; methods that learn condition-invariant features (the OUTER GABA controller does this by construction) gain ground.

The DS02 Ranking: GRACE Wins Both Axes

Rank	Method	RMSE (n=5)	NASA score	R²
1	GRACE	6.345 ± 0.366	8,560	0.886
2	PCGrad	6.398 ± 0.260	8,667	—
3	GABA	6.554 ± 0.271	8,919	0.878
4	Uncertainty	6.556 ± 0.294	9,108	—
5	CAGrad	6.826 ± 0.410	9,368	—
6	AMNL	6.865 ± 0.344	9,706	0.866
7	GradNorm	7.019 ± 0.700	9,972	—
8	Baseline	7.052 ± 0.474	10,033	0.859
9	DWA	7.206 ± 0.526	10,613	—

Three observations:

GRACE has the lowest mean on both axes. RMSE 6.345 vs the next-best PCGrad (6.398). NASA 8560 vs the next-best PCGrad (8667). The 0.05 RMSE gap is within seed noise; the 107-point NASA gap is meaningful.
R² climbs monotonically. Baseline 0.859 → AMNL 0.866 → GABA 0.878 → GRACE 0.886. Each step adds an axis (inner WMSE, outer GABA, both) and gains ~0.008 R². Small but consistent.
Per-seed std is small. GRACE's 0.366 is tighter than half the other methods. The OUTER controller's self-calibration property (chapter 22 §2 robustness) carries across to N-CMAPSS.

Interactive: Pareto + 2×2 Factorial + DKAMFormer

Three panels below: (a) the full 9-method scatter on DS02 with seed-error bars; (b) the 2×2 factorial walking through Baseline → AMNL → GABA → GRACE; (c) the comparison against DKAMFormer. Click any dot in (a) to read its numbers; click a cell in (b) to see the per-axis delta.

Loading N-CMAPSS DS02 explorer…

What the visualization shows. In (a) only GRACE sits on the Pareto front — every other dot is strictly dominated. In (b) each axis flip cleanly improves both RMSE and NASA, and the corner cell (GRACE) is the best overall. In (c) GRACE's 6.345 RMSE just edges DKAMFormer's 6.40 published number.

The 2×2 Factorial: Why GRACE Wins

The four corners of the 2×2 factorial test the orthogonality argument from chapter 21·1: outer (per-task weighting) and inner (per-sample weighting) should compose constructively. On N-CMAPSS DS02 they do, with neat additive deltas:

Step	Cell change	Δ RMSE	Δ NASA
Baseline → AMNL	Standard MSE → Weighted MSE (inner axis)	−0.19	−327
GABA → GRACE	Standard MSE → Weighted MSE (inner, with GABA)	−0.20	−359
Baseline → GABA	Fixed → GABA (outer axis)	−0.50	−1,114
AMNL → GRACE	Fixed → GABA (outer, with WMSE)	−0.51	−1,146
Baseline → GRACE	Both axes flipped	−0.71	−1,473

The deltas are remarkably consistent: the inner-axis effect is ~−0.20 RMSE and ~−340 NASA whether or not GABA is on; the outer-axis effect is ~−0.51 RMSE and ~−1130 NASA whether or not WMSE is on. The total improvement Baseline → GRACE = 0.71 RMSE and 1473 NASA — almost exactly the sum of the two single-axis deltas (−0.20 + −0.51 = −0.71; −340 + −1130 = −1470).

The orthogonality argument validates on real data. The 2×2 factorial is what the paper's Figure 5 shows. It empirically confirms the chapter-21 claim: OUTER and INNER axes compose ADDITIVELY on the orthogonal-axes test. There is no interaction term — GRACE = Baseline + GABA-effect + WMSE-effect, with no penalty for combining.

Beating Published SOTA: GRACE vs DKAMFormer

DKAMFormer (Dual-Kernel Attention Multi-head transFormer; Fu et al., IEEE TII 2025) is the strongest published baseline on N-CMAPSS DS02 in the dual-attention transformer family. It uses the same 20-feature protocol, same RUL cap, same NASA scoring. Reported RMSE: 6.40.

GRACE's 5-seed mean of 6.345 beats DKAMFormer by 0.055 cycles. The gap is small per individual seed (DKAMFormer reports a single-seed result; GRACE's seed std is 0.366). What is significant is that GRACE achieves this with:

A simpler architecture. CNN-BiLSTM-Attention (~1.7M params) vs DKAMFormer's dual-attention transformer (no public param count, but typically 5-10M for that family).
Same training budget. 500 epochs cap, AdamW, EMA, gradient clipping. No DKAMFormer-specific tricks.
5-seed reproducibility. Every published number has a standard deviation. DKAMFormer's paper reports a single-seed result — the GRACE comparison is conservative because we report mean - std vs their single number.

Why N-CMAPSS Favors GRACE Over C-MAPSS

On C-MAPSS multi-condition, GRACE is on the Pareto front (winning NASA, second on RMSE, behind AMNL). On N-CMAPSS DS02, GRACE is the SINGLE Pareto winner. The mechanism behind the upgrade:

Continuous flight envelope. C-MAPSS's 6 discrete regimes can be approximated by per-condition normalisation. N-CMAPSS's continuous (alt, Mach, throttle) space cannot — the model has to learn condition-invariant features the hard way. The OUTER GABA axis pulls the shared backbone toward those features by amplifying the auxiliary health task (which generalises across conditions); on N-CMAPSS this matters more.
Larger failure-region subgroup. N-CMAPSS DS02 has trajectories spanning 25-90 flights per unit, far more near-failure samples than C-MAPSS's short windows. The INNER weighted-MSE axis has more signal to amplify.
More noise. 0.1-second sensor sampling captures realistic high-frequency noise. The OUTER GABA controller's EMA smoothing cleans the per-task gradient signal more effectively here than on the cleaner C-MAPSS.
Better gradient cosine. Chapter 21·3 showed GABA helps when the auxiliary task's gradient aligns with the primary task's on the shared backbone. On N-CMAPSS's richer feature space the alignment is empirically stronger than on C-MAPSS — another reason GRACE's lead grows.

Python: The DKAMFormer 20-Feature Protocol

The exact feature selection that makes the comparison apples-to-apples. From experiments/ncmapss/src/ncmapss_20features_dataset.py.

DKAMFormer 20-feature extraction with NumPy

🐍ncmapss_20features.py

Explanation(36)

Code(74)

1docstring

States the contract: extract DKAMFormer's exact 20-feature view from the N-CMAPSS DS02 HDF5 file. Same protocol = fair comparison with the published SOTA.

8import h5py

HDF5 I/O library. N-CMAPSS distributes DS02 as a single HDF5 file (~5 GB) with named datasets W, X_s, X_v, A, Y for each split.

EXECUTION STATE

📚 h5py = Python interface to HDF5. Lazy reads via .File(path, 'r')['key']. NumPy-compatible.

9import numpy as np

NumPy. Used for np.asarray, np.column_stack, fancy indexing.

16W_VARS = ["alt", "Mach", "TRA", "T2"]

Names of the 4 columns in the W array — the operating condition descriptors. alt = altitude (m), Mach = Mach number, TRA = throttle resolver angle (%), T2 = inlet temperature (R).

17XS_VARS = ["T24","T30","T48","T50","P15","P2","P21","P24","Ps30","P40","P50","Nf","Nc","Wf"]

14 physical sensors from N-CMAPSS X_s array. Temperatures (T*), pressures (P*), shaft speeds (Nf, Nc), fuel flow (Wf).

19XV_VARS = ["T40","P30","P45","W21","W22","W25","W31","W32","W48","W50","SmFan","SmLPC","SmHPC","phi"]

14 virtual sensors (X_v array) — quantities the simulator computes but real engines do not directly measure. T40 (burner temperature) and P30 (HPC outlet pressure) are the two DKAMFormer uses; the others (Sm*, phi) are degradation parameters that are too informative and would leak ground truth.

27DKAMFORMER_20 = (

Construct the canonical 20-feature ordering: 3 scenario + 1 inlet temp + 14 physical + 2 virtual = 20. Order matters because the dataset will be NORMALISED per-feature; downstream the model expects this exact order.

28["alt", "Mach", "TRA"] # 3 scenario

First three columns of W. The flight-envelope descriptors that tell the model which operating regime each sample is in.

EXECUTION STATE

→ why include scenario? = Without scenario inputs the same RUL value can correspond to wildly different sensor signatures (cruise vs takeoff). The model needs to know the operating point.

29+ ["T2"] # 1 from W

T2 (inlet temperature) is in the W array — it's a sensor, not an operating condition, but lives in W historically. DKAMFormer pulls it out separately.

30+ XS_VARS # 14 physical

All 14 physical sensors from X_s — temperatures, pressures, speeds, fuel flow.

31+ ["T40", "P30"] # 2 virtual

Two SAFE virtual sensors. T40 (burner T) and P30 (HPC outlet P) are estimable from physical sensors via thermodynamic relations, so including them in the input is not data leakage. The other X_v columns (Sm*, phi) ARE leakage and are excluded by DKAMFormer's protocol.

EXECUTION STATE

→ why exclude Sm*, phi? = Sm{Fan,LPC,HPC} are the per-component health margins that drive the ground-truth RUL. phi is the fuel-air ratio offset that defines the failure mode. Including these ≈ giving the model the answer.

38def extract_20_features(h5_path: str, unit_id: int):

Pull the 20-feature trajectory of one engine unit from the HDF5 file.

EXECUTION STATE

⬇ input: h5_path = Path to N-CMAPSS_DS02-006.h5 (~5 GB).

⬇ input: unit_id = Engine number (DS02 has 7 train + 2 test units).

⬆ returns = (features (T, 20), rul (T,)) for one unit's full lifetime.

37with h5py.File(h5_path, "r") as f:

Open the HDF5 file in read mode. Context manager closes it cleanly even on exception.

EXECUTION STATE

📚 h5py.File(path, mode) = HDF5 file handle. mode='r' = read-only, 'w' = write (truncates), 'a' = append.

38w = np.asarray(f["W_dev"])

Load the W (operating conditions) array for the development split. _dev = development units (training); _test = held-out units.

EXECUTION STATE

📚 np.asarray(h5_dataset) = Reads the entire HDF5 dataset into RAM as ndarray. Fast for the W/X_s/X_v shapes (each ~10MB-200MB).

w shape = (T_total, 4) where T_total ≈ 5 million for DS02 dev split.

39x_s = np.asarray(f["X_s_dev"])

Physical sensor array. Shape (T_total, 14).

40x_v = np.asarray(f["X_v_dev"])

Virtual sensor array. Shape (T_total, 14). We only use the first 2 columns.

41units = np.asarray(f["A_dev"])[:, 0]

Auxiliary metadata. Column 0 of A_dev is the unit ID — tells us which engine each row belongs to. Slicing [:, 0] keeps only that column.

EXECUTION STATE

→ A_dev contents = Cycle index (col 0), Fault class (col 1), Hs (col 2). We only need col 0 here.

42rul = np.asarray(f["Y_dev"]).flatten()

Ground-truth RUL per row. .flatten() collapses (T_total, 1) → (T_total,) for easier indexing.

EXECUTION STATE

📚 .flatten() = Returns a 1-D copy. Equivalent to .reshape(-1).

47mask = units == unit_id

Boolean mask: True where this unit's rows live. Lets us slice all 5 arrays consistently.

EXECUTION STATE

📚 ndarray == = Element-wise equality returning a boolean ndarray. Used for fancy indexing.

48feats = np.column_stack([

Build the 20-feature matrix by horizontally concatenating slices.

EXECUTION STATE

📚 np.column_stack(arrs) = Concatenate 1-D or 2-D arrays as columns. (T, a) + (T, b) → (T, a+b).

46w[mask, 0:3], # alt, Mach, TRA

Rows of this unit, columns 0..2 of W (the 3 scenario descriptors).

EXECUTION STATE

→ 0:3 slice = Half-open: includes 0, 1, 2; excludes 3. NumPy convention.

47w[mask, 3:4], # T2

T2 only — slice 3:4 (not just 3) so the result is 2-D (T, 1) compatible with column_stack.

EXECUTION STATE

→ why 3:4 vs 3? = w[mask, 3] returns shape (T,) — 1-D. w[mask, 3:4] returns (T, 1) — 2-D. column_stack needs 2-D for clean concatenation.

48x_s[mask, :], # all 14 physical

All 14 physical sensors for this unit. (T_unit, 14).

49x_v[mask, 0:2], # T40, P30

First 2 columns of X_v — T40 and P30. (T_unit, 2).

50])

Close the column_stack. Result shape: (T_unit, 3 + 1 + 14 + 2) = (T_unit, 20).

55return feats, rul[mask]

Return the (features, rul) pair for this unit.

EXECUTION STATE

⬆ return: feats = ndarray (T_unit, 20) — DKAMFormer-format trajectory.

⬆ return: rul[mask] = ndarray (T_unit,) — per-cycle ground truth.

59T = 4

Synthetic timestep count for the sanity check (4 cycles).

57fake_w = np.array([[10000, 0.65, 100, 520], ...])

Synthetic W array. 4 timesteps × [alt, Mach, TRA, T2]. Shows the realistic flight-envelope ranges: altitude in feet (~10000), Mach (~0.65), throttle %.

EXECUTION STATE

fake_w (4, 4) =

[[10000, 0.65, 100, 520], [11000, 0.68, 102, 525], [12000, 0.72, 105, 530], [13000, 0.75, 107, 535]]

64fake_xs = np.tile(np.arange(14, dtype=float), (T, 1))

Synthetic physical-sensor array. Each row = [0, 1, 2, ..., 13]. Distinct values per column so the column ordering after extraction is verifiable.

EXECUTION STATE

📚 np.tile(arr, reps) = Repeat the array. np.tile([0..13], (4, 1)) stacks 4 copies vertically → shape (4, 14).

65fake_xv = np.tile(np.arange(14, dtype=float) + 100, (T, 1))

Synthetic virtual-sensor array. Each row = [100, 101, ..., 113].

68extracted = np.column_stack([

Apply the same selection logic as the real function.

69fake_w[:, 0:3], fake_w[:, 3:4], fake_xs, fake_xv[:, 0:2],

Pull alt/Mach/TRA, T2, all 14 physical, T40/P30. Total: 3+1+14+2 = 20.

EXECUTION STATE

extracted shape = (4, 20) — 4 timesteps × 20 features.

extracted[0] = [10000, 0.65, 100, 520, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 100, 101]

41])

Close the column_stack.

69print(f"feature_dim = {extracted.shape[1]} (expected: 20)")

Sanity check: confirm 20 features.

EXECUTION STATE

Output = feature_dim = 20 (expected: 20)

70print(f"first row = {extracted[0].tolist()}")

Print the first row to verify ordering.

71print(f"feature names = {DKAMFORMER_20}")

Print canonical names so the column ordering is human-checkable.

EXECUTION STATE

Final output =

feature_dim = 20    (expected: 20)
first row   = [10000.0, 0.65, 100.0, 520.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 100.0, 101.0]
feature names = ['alt', 'Mach', 'TRA', 'T2', 'T24', 'T30', 'T48', 'T50', 'P15', 'P2', 'P21', 'P24', 'Ps30', 'P40', 'P50', 'Nf', 'Nc', 'Wf', 'T40', 'P30']

38 lines without explanation

1"""DKAMFormer 20-feature extraction from N-CMAPSS HDF5.
2
3Source: paper_ieee_tii/experiments/ncmapss/src/ncmapss_20features_dataset.py.
4Lines 30-75 — selects exactly the 20 sensors DKAMFormer (Fu et al. 2025)
5uses, so the comparison is apples-to-apples.
6"""
7
8import h5py
9import numpy as np
10
11
12# ---------- N-CMAPSS HDF5 array layout (DS02) ----------
13# W   (operating conditions, 4 cols): alt, Mach, TRA, T2
14# X_s (physical sensors, 14 cols):    T24, T30, T48, T50, P15, P2, P21,
15#                                     P24, Ps30, P40, P50, Nf, Nc, Wf
16# X_v (virtual sensors, 14 cols):     T40, P30, P45, W21, W22, W25, W31,
17#                                     W32, W48, W50, SmFan, SmLPC, SmHPC, phi
18W_VARS  = ["alt", "Mach", "TRA", "T2"]
19XS_VARS = ["T24","T30","T48","T50","P15","P2","P21","P24",
20           "Ps30","P40","P50","Nf","Nc","Wf"]
21XV_VARS = ["T40","P30","P45","W21","W22","W25","W31","W32",
22           "W48","W50","SmFan","SmLPC","SmHPC","phi"]
23
24
25# DKAMFormer's exact 20 features (paper Table V):
26#   3 scenario from W      — alt, Mach, TRA
27#   1 inlet temperature    — T2 (also in W)
28#  14 physical from X_s    — all of them
29#   2 virtual from X_v     — T40 (burner T), P30 (HPC pressure)
30DKAMFORMER_20 = (
31    ["alt", "Mach", "TRA"]   # 3 scenario
32    + ["T2"]                  # 1 from W
33    + XS_VARS                 # 14 physical
34    + ["T40", "P30"]          # 2 virtual
35)
36
37
38def extract_20_features(h5_path: str, unit_id: int):
39    """Return (sequence, rul_array) for one engine unit on DS02."""
40    with h5py.File(h5_path, "r") as f:
41        w     = np.asarray(f["W_dev"])         # (T, 4)
42        x_s   = np.asarray(f["X_s_dev"])       # (T, 14)
43        x_v   = np.asarray(f["X_v_dev"])       # (T, 14)
44        units = np.asarray(f["A_dev"])[:, 0]   # (T,) unit IDs
45        rul   = np.asarray(f["Y_dev"]).flatten() # (T,) RULs
46
47    mask  = units == unit_id
48    feats = np.column_stack([
49        w[mask, 0:3],       # alt, Mach, TRA
50        w[mask, 3:4],       # T2
51        x_s[mask, :],       # all 14 physical
52        x_v[mask, 0:2],     # T40, P30
53    ])
54
55    return feats, rul[mask]
56
57
58# ---------- Sanity check on a small synthetic example ----------
59T = 4
60fake_w   = np.array([[10000, 0.65, 100, 520],
61                     [11000, 0.68, 102, 525],
62                     [12000, 0.72, 105, 530],
63                     [13000, 0.75, 107, 535]], dtype=float)
64fake_xs  = np.tile(np.arange(14, dtype=float), (T, 1))
65fake_xv  = np.tile(np.arange(14, dtype=float) + 100, (T, 1))
66
67# DKAMFormer = [alt, Mach, TRA, T2, X_s(14), X_v[0:2]]
68extracted = np.column_stack([
69    fake_w[:, 0:3], fake_w[:, 3:4], fake_xs, fake_xv[:, 0:2],
70])
71
72print(f"feature_dim = {extracted.shape[1]}    (expected: 20)")
73print(f"first row   = {extracted[0].tolist()}")
74print(f"feature names = {DKAMFORMER_20}")

Excluding Sm* and phi is critical. The X_v array contains 14 columns; the last 4 (SmFan, SmLPC, SmHPC, phi) are degradation parameters — the SIMULATOR's ground-truth component health margins. Including them in the input is data leakage: the model would be reading the answer. The DKAMFormer protocol explicitly drops them; GRACE follows the same rule.

PyTorch: N-CMAPSS Dataset Loader

Production dataset class. Subsamples the 5M-row dev split by 100× to ~50k rows, builds 50-cycle sliding windows, applies per-feature MinMax normalisation fitted on train, returns (sequence, RUL, unit-id) triples.

Production N-CMAPSS Dataset class

🐍ncmapss_dataset.py

Explanation(49)

Code(76)

1docstring

States the contract: a PyTorch Dataset wrapping N-CMAPSS DS02 with the DKAMFormer 20-feature protocol. Used by every method in the paper's N-CMAPSS sweep — the dataset is the COMMON ingredient; only the model and loss differ.

9import h5py

HDF5 reader. Same as the NumPy script.

10import numpy as np

NumPy. Used for stacking, subsampling, scaling.

11import torch

PyTorch core. Needed for torch.tensor in __getitem__.

12from torch.utils.data import Dataset

PyTorch base class. Subclassing Dataset + implementing __len__ and __getitem__ is the canonical way to plug into DataLoader.

EXECUTION STATE

📚 Dataset = Abstract base. DataLoader calls dataset[i] for each batch element. __len__ tells DataLoader how many samples exist.

15class NCMAPSS20FeaturesDataset(Dataset):

Subclass Dataset. Inherits the standard PyTorch dataset interface; overrides __init__, __len__, __getitem__, plus get_scaler_params for train/test split coordination.

18def __init__(self, data_path: str, sequence_length: int = 50,

Constructor. The default sequence_length=50 matches DKAMFormer; the paper's GRACE run on DS02 uses the same.

EXECUTION STATE

⬇ data_path = HDF5 file path.

⬇ sequence_length=50 = Sliding-window length, 50 cycles. Longer than C-MAPSS's 30 because DS02 has more per-cycle resolution.

19max_rul: int = 125, train: bool = True,

max_rul: the same piecewise-linear cap as C-MAPSS (Saxena 2008 convention). train=True selects the dev split; False → test.

20scaler_params=None, random_seed: int = 42,

scaler_params: (min, max) tuple from a previously-fitted train scaler. Pass it when building the test split so test data uses the train normalisation (no leakage).

21subsample_factor: int = 100, window_stride: int = 1):

subsample_factor=100: keep every 100th row of the full ~5M-row DS02. Reduces train size to ~50k rows; matches the DKAMFormer protocol. window_stride=1: every offset; stride=10 would speed up at slight accuracy cost.

EXECUTION STATE

→ why subsample? = DS02 records every 0.1 second; physically the engine doesn't change that fast. 100× decimation keeps trends without flooding the loader.

22self.seq_len = sequence_length

Cache as instance attribute.

23self.max_rul = max_rul

Cache.

24self.train = train

Cache for branching downstream (e.g. some methods do augmentation only in train).

25self.subsample_factor = subsample_factor

Cache.

26self.stride = window_stride

Cache.

28split = "dev" if train else "test"

Pick which HDF5 group to read. dev = development units (train); test = held-out units.

29with h5py.File(data_path, "r") as f:

Open the file with a context manager.

30w = np.asarray(f[f"W_{split}"])

f-string interpolation builds the dataset name (W_dev / W_test). Loads the operating-condition array.

EXECUTION STATE

📚 f-string {var} = Python 3.6+ string interpolation. f'W_{split}' → 'W_dev' or 'W_test'.

31x_s = np.asarray(f[f"X_s_{split}"])

Physical sensor array.

32x_v = np.asarray(f[f"X_v_{split}"])

Virtual sensor array.

33uid = np.asarray(f[f"A_{split}"])[:, 0].astype(int)

Unit ID column. .astype(int) ensures integer type for safe equality comparison and indexing.

EXECUTION STATE

📚 .astype(dtype) = Convert ndarray dtype. Returns a NEW array unless dtype matches.

34rul = np.asarray(f[f"Y_{split}"]).flatten()

Per-row ground-truth RUL.

38feats = np.column_stack([w[:, 0:3], w[:, 3:4], x_s, x_v[:, 0:2]])

Stack the 20 features. Same logic as the NumPy script's sanity check.

EXECUTION STATE

feats shape = (T_total, 20) — pre-subsample.

41feats = feats[::subsample_factor]

Slice with step subsample_factor. ::100 keeps rows 0, 100, 200, ... — every 100th cycle.

EXECUTION STATE

📚 [::step] = NumPy slicing with step. ::100 reduces a 5M-row array to 50k rows in O(1) (returns a view, not a copy).

42rul = rul[::subsample_factor]

Subsample RULs in lockstep.

43uid = uid[::subsample_factor]

Subsample unit IDs in lockstep.

46rul = np.minimum(rul, max_rul)

Clip the targets at 125 (the piecewise-linear RUL cap).

49if scaler_params is None:

Train branch: fit the scaler.

50self.feat_min = feats.min(axis=0)

Per-feature min over the train split.

EXECUTION STATE

📚 .min(axis=0) = Reduction along the first axis. (T, 20) → (20,) per-feature minima.

51self.feat_max = feats.max(axis=0)

Per-feature max.

52else:

Test branch: reuse the train scaler.

53self.feat_min, self.feat_max = scaler_params

Tuple-unpack. Critical for correctness: test must be normalised with the SAME (min, max) as train, otherwise the model sees a different feature distribution at eval time.

EXECUTION STATE

→ leakage prevention = If the test set fits its own scaler, the model implicitly ‘sees’ the test feature ranges during normalisation — a subtle but real form of data leakage.

54feats = (feats - self.feat_min) / (self.feat_max - self.feat_min + 1e-8)

MinMax normalise to [0, 1]. The 1e-8 floor prevents division by zero when a feature is constant on the train split (rare but possible for some sensor channels).

EXECUTION STATE

→ why MinMax not z-score? = Min-max preserves the bounded ranges of physical sensors (e.g. Mach number ∈ [0, 1]) without making rare-large values dominate after centering. Both choices are valid; the paper uses MinMax.

57self.windows = []

Per-window index list. Each entry: (which rows, end-row, unit-id).

58for u in np.unique(uid):

Iterate per unit. Crucial: windows must NOT span engines. Sliding across a unit boundary would mix two engines' trajectories.

EXECUTION STATE

→ why per-unit? = Engine-A's degradation curve is independent of engine-B's. A window that contains both is meaningless.

59idx = np.where(uid == u)[0]

Get the indices in the subsampled array where unit ID equals u.

EXECUTION STATE

📚 np.where(cond) = Returns the INDICES where cond is True. With one argument it returns a tuple of arrays — [0] takes the first dimension's indices.

60for s in range(0, len(idx) - sequence_length, window_stride):

Sliding-window start indices. Stop at len(idx) - sequence_length so the last window stays within this unit.

61end = idx[s + sequence_length - 1]

The end-row of this window in the GLOBAL feats/rul arrays. Used to look up the target RUL — by convention the window predicts the RUL at its last cycle.

62self.windows.append((idx[s:s + sequence_length], end, u))

Store (index slice, end position, unit). The slice tells __getitem__ which rows to assemble; end gives the y target; u is for last-cycle NASA aggregation.

64self.feats, self.rul = feats, rul

Cache the arrays for __getitem__. We don't store uid because the per-window unit ID is already in self.windows.

66def __len__(self):

DataLoader calls len(dataset) to know how many samples exist.

67return len(self.windows)

Number of sliding windows. Roughly (T_total // subsample - seq_len) per unit, summed.

69def __getitem__(self, i):

DataLoader calls dataset[i] for each batch element. Must return a tuple/dict that the loader can collate.

EXECUTION STATE

⬇ input: i = Integer in [0, len(self)). DataLoader picks i randomly under shuffle=True.

⬆ returns = (x, y, uid) — one (sequence, RUL, unit-id) triple.

70idxs, end, u = self.windows[i]

Tuple-unpack the cached window descriptor.

71x = self.feats[idxs]

Fancy indexing: pull rows idxs[0..seq_len-1] from feats. Result shape (seq_len, 20).

72y = self.rul[end]

Target = RUL at the window's last cycle.

73return torch.tensor(x, dtype=torch.float32), torch.tensor(y, dtype=torch.float32), torch.tensor(u, dtype=torch.long)

Convert to torch tensors. dtype=float32 for inputs and targets; int64 (long) for the unit ID because PyTorch's default integer type is int64.

EXECUTION STATE

📚 torch.tensor(arr, dtype) = Construct a tensor from data. dtype=torch.float32 is the standard for neural-network inputs.

→ why long for uid? = uid is an integer category, never used in arithmetic. Long avoids fractional surprises during last-cycle aggregation.

75def get_scaler_params(self):

Public method so the test dataset can fetch the train scaler. The training script calls this AFTER constructing the train dataset, then passes the result into the test dataset's scaler_params.

76return (self.feat_min, self.feat_max)

Return the (min, max) pair as a tuple.

EXECUTION STATE

Final: usage in run_grace_ncmapss =

train_ds = NCMAPSS20FeaturesDataset(path, train=True)
test_ds  = NCMAPSS20FeaturesDataset(path, train=False, scaler_params=train_ds.get_scaler_params())

27 lines without explanation

1"""Production N-CMAPSS DS02 dataset for GRACE training.
2
3Source: paper_ieee_tii/experiments/ncmapss/src/ncmapss_20features_dataset.py.
4Subsamples the per-cycle data, builds sliding-window sequences,
5clips the RUL at max_rul, and applies per-feature MinMax normalisation
6fitted on the train split.
7"""
8
9import h5py
10import numpy as np
11import torch
12from torch.utils.data import Dataset
13
14
15class NCMAPSS20FeaturesDataset(Dataset):
16    """N-CMAPSS DS02 with the DKAMFormer 20-feature protocol."""
17
18    def __init__(self, data_path: str, sequence_length: int = 50,
19                 max_rul: int = 125, train: bool = True,
20                 scaler_params=None, random_seed: int = 42,
21                 subsample_factor: int = 100, window_stride: int = 1):
22        self.seq_len = sequence_length
23        self.max_rul = max_rul
24        self.train   = train
25        self.subsample_factor = subsample_factor
26        self.stride  = window_stride
27
28        # ---- Load HDF5 split ----
29        split = "dev" if train else "test"
30        with h5py.File(data_path, "r") as f:
31            w   = np.asarray(f[f"W_{split}"])
32            x_s = np.asarray(f[f"X_s_{split}"])
33            x_v = np.asarray(f[f"X_v_{split}"])
34            uid = np.asarray(f[f"A_{split}"])[:, 0].astype(int)
35            rul = np.asarray(f[f"Y_{split}"]).flatten()
36
37        # ---- Stack the 20 features ----
38        feats = np.column_stack([w[:, 0:3], w[:, 3:4], x_s, x_v[:, 0:2]])
39
40        # ---- Subsample (DS02 has 5M rows; 100x → 50k manageable) ----
41        feats = feats[::subsample_factor]
42        rul   = rul[::subsample_factor]
43        uid   = uid[::subsample_factor]
44
45        # ---- Clip RUL at max_rul (piecewise-linear convention) ----
46        rul = np.minimum(rul, max_rul)
47
48        # ---- Fit / apply MinMax scaler ----
49        if scaler_params is None:
50            self.feat_min = feats.min(axis=0)
51            self.feat_max = feats.max(axis=0)
52        else:
53            self.feat_min, self.feat_max = scaler_params
54        feats = (feats - self.feat_min) / (self.feat_max - self.feat_min + 1e-8)
55
56        # ---- Build sliding-window indices, per unit ----
57        self.windows = []
58        for u in np.unique(uid):
59            idx = np.where(uid == u)[0]
60            for s in range(0, len(idx) - sequence_length, window_stride):
61                end = idx[s + sequence_length - 1]
62                self.windows.append((idx[s:s + sequence_length], end, u))
63
64        self.feats, self.rul = feats, rul
65
66    def __len__(self):
67        return len(self.windows)
68
69    def __getitem__(self, i):
70        idxs, end, u = self.windows[i]
71        x  = self.feats[idxs]
72        y  = self.rul[end]
73        return torch.tensor(x, dtype=torch.float32), torch.tensor(y, dtype=torch.float32), torch.tensor(u, dtype=torch.long)
74
75    def get_scaler_params(self):
76        return (self.feat_min, self.feat_max)

Per-unit windowing. The sliding-window loop on line 57 iterates per UNIT, not globally. A window that crosses a unit boundary mixes two engines' trajectories — the target RUL would correspond to engine A while the inputs span engine A → engine B. Per-unit windowing preserves trajectory causality.

When To Generalise To Realistic-Operating Data

Domain	Steady-state benchmark	Realistic-operating equivalent	Generalisation pattern
Aero engines	C-MAPSS (steady regimes)	N-CMAPSS DS02 (real flight envelope)	GABA-style adaptive weighting helps more on the realistic data because operating-point variation is continuous.
Battery health	Calce constant-current cycling	EV-driving NASA-Ames data	Methods learning condition-invariant features (current-rate, temperature) generalise; methods exploiting fixed-cycle structure don't.
Industrial robots	Lab repetitive tasks	Field deployment with varied payloads	Multi-task learning where the auxiliary task encodes payload classification helps under distribution shift.
Wind-turbine diagnostics	Wind-tunnel calibrated runs	Field data under variable wind	Same pattern: outer-axis adaptation helps when operating conditions vary continuously.

The general lesson: methods that rely on fixed-condition structure (single normalisation, single regime classifier) saturate on the steady-state benchmark and fail to scale. Methods that adapt per-batch to gradient/feature alignment (GRACE's OUTER axis) improve on the harder benchmark because there is more variance to exploit.

Pitfalls When Comparing Across Benchmarks

Pitfall 1: comparing N-CMAPSS RMSE to C-MAPSS RMSE

N-CMAPSS DS02's 6.35 RMSE and C-MAPSS FD002's 7.72 RMSE are not directly comparable. The two datasets have different RUL ranges, sample counts, sensor noise levels, and evaluation protocols. Only methods evaluated on the SAME benchmark with the SAME protocol can be ranked.

Pitfall 2: forgetting the subsample factor in NASA scoring

DS02's NASA score is ~8500, C-MAPSS's is ~230. The 100× difference is largely the unit count: DS02 has fewer units but each contributes more cycles after subsampling. NASA is a SUM, not a mean — report both the sum and the per-unit average for cross-benchmark sanity checks.

Pitfall 3: leakage via the virtual-sensor channels

Including SmFan / SmLPC / SmHPC / phi in the inputs — even by accident — gives the model the simulator's ground-truth degradation markers. RMSE typically halves; results become unpublishable. The DKAMFormer 20-feature protocol excludes them and GRACE follows.

Pitfall 4: comparing single-seed paper results to multi-seed

DKAMFormer's 6.40 is a single-seed point estimate. GRACE's 6.35 is a 5-seed mean. The honest comparison is $6.35 \pm 0.37$ vs the unknown variance of DKAMFormer. We cannot say GRACE strictly beats it without DKAMFormer's seed std; we can say GRACE's mean is below their reported number, with high reproducibility.

Pitfall 5: assuming the C-MAPSS Pareto picture transfers

On C-MAPSS multi-condition, AMNL is the accuracy corner. On N-CMAPSS DS02, AMNL is rank 6 of 9. The shape of the Pareto front is benchmark-specific; do not extrapolate the C-MAPSS front to a new dataset without re-running the methods.

Takeaway

N-CMAPSS DS02 is the realistic-flight upgrade to C-MAPSS: continuous flight envelope, 0.1-s sampling, ~5M rows, two-fault degradation. Methods that win on C-MAPSS do not automatically win here.
On DS02 GRACE wins BOTH RMSE (6.345 ± 0.366) AND NASA (8560) — the only Pareto-optimal method out of the 9 MTL variants studied. R² also climbs monotonically with the GABA + WMSE composition: 0.859 → 0.866 → 0.878 → 0.886.
The 2×2 factorial (Baseline / AMNL / GABA / GRACE) empirically validates the chapter-21 orthogonality claim. Outer-axis effect ≈ −0.51 RMSE; inner-axis effect ≈ −0.20 RMSE; combined effect ≈ sum.
GRACE's mean RMSE 6.345 beats the published DKAMFormer (6.40) by 0.055 cycles, with a simpler architecture (~1.7M params) and 5-seed reproducibility.
The DKAMFormer 20-feature protocol — 3 scenario + 1 inlet T + 14 physical sensors + 2 safe virtual sensors — is essential. Including the 4 leakage-prone X_v columns (Sm*, phi) silently halves RMSE.