Chapter 1
16 min read
Section 2 of 121

The RUL Prediction Problem

Predictive Maintenance & RUL

The Fuel Gauge for Machines

Every car has a gauge that says “42 miles to empty.” You do not get a stream of fuel-tank pressures, fuel-pump currents, and ECU error codes — the gauge has already collapsed all of that into one number you can act on. Predictive maintenance is the same idea, applied to anything that wears out: squeeze the multi-dimensional sensor history into a single scalar — Remaining Useful Life — that a maintenance scheduler can actually use.

That is why nearly every paper, dashboard, and commercial diagnostic product in this space ultimately reports one number: RUL^\widehat{\text{RUL}} in cycles, hours, or kilometres remaining. It is the actionable statistic. The model can be a transformer, a CNN, an LSTM, or three of them stacked — the output that the maintenance crew sees is one number.

RUL = the fuel gauge. Whatever exotic architecture you pick, its job is to turn a (cycles, sensors)-shaped window into a single scalar.

RUL, Formally

Let xtRd\mathbf{x}_t \in \mathbb{R}^{d} be the vector of sensor readings at cycle tt, where dd is the number of sensors (14 for C-MAPSS, 20 for N-CMAPSS DS02). A run-to-failure trajectory is the sequence x1,x2,,xtfail\mathbf{x}_1, \mathbf{x}_2, \ldots, \mathbf{x}_{t_{\text{fail}}}.

At any moment ttfailt \le t_{\text{fail}}, the true remaining useful life is the trivially obvious quantity

RULt=tfailt.\text{RUL}_t = t_{\text{fail}} - t.

The prediction problem is to estimate RULt\text{RUL}_t from a window of past readings only — the model never gets to peek at tfailt_{\text{fail}}. Concretely, given a window of the last WW cycles

Xt=(xtW+1,xtW+2,,xt)RW×d,\mathbf{X}_t = (\mathbf{x}_{t-W+1},\, \mathbf{x}_{t-W+2},\, \ldots,\, \mathbf{x}_t) \in \mathbb{R}^{W \times d},

we want a function fθf_\theta (a neural network with parameters θ\theta) such that

RUL^t  =  fθ(Xt)    tfailt.\widehat{\text{RUL}}_t \;=\; f_\theta(\mathbf{X}_t) \;\approx\; t_{\text{fail}} - t.

That is the entire problem statement. The next 28 chapters are how to choose fθf_\theta well, what training objective to put on θ\theta, and how to get good values for θ\theta from data.

Why a window and not the full history? Two reasons. (1) Old cycles are irrelevant once the operating regime has changed. (2) Most attention and recurrent architectures scale super-linearly in sequence length, so feeding the entire history of an engine that has been running for 200 cycles wastes compute. The window is the practical compromise.

Interactive: One Sample, Up Close

The diagram below is a single engine that fails at cycle 200. Five sensors are drifting away from their healthy baselines on physically-motivated curves (vibration up, oil pressure down, exhaust temperature up, and so on). The orange band is the model's 30-cycle input window; the green panel is the scalar RUL the model is asked to predict.

Loading interactive RUL explorer…

Drag the cursor and watch how a single engine generates many training samples — each cursor position is one row of the supervised dataset. Drag the failure cycle down to 100 and you simulate a short-lived engine; drag the window size down to 5 and the model has almost no context to work with. By Chapter 7 we will commit to W=30W = 30 and d=17d = 17 for C-MAPSS — this is just where those numbers come from.

Where Do RUL Labels Come From?

For supervised learning we need ground-truth RUL labels on the training data. There is exactly one way to obtain them: run the engine to failure and timestamp the moment it dies. Once you know tfailt_{\text{fail}}, every earlier cycle gets its label tfailtt_{\text{fail}} - t for free.

That single-sentence procedure has a consequence that haunts every working prognostic dataset: most engines never run to failure. They get pulled for unrelated reasons (lease return, fleet retirement, regulatory inspection), and you have a sensor history but no label. NASA C-MAPSS dodges this by being a simulation: every trajectory in FD001-FD004 is run to failure by construction. Real-world prognostic projects almost always have to confront the “censoring” problem — we will return to it at the end of this section.

Python: Build (X, y) Pairs From One Engine Run

Before any neural network, the labelling procedure is forty lines of NumPy. We simulate one run-to-failure trajectory, slide a 30-cycle window across it, and emit one (input, target) pair per cursor position — 171 pairs from a single engine with 200 cycles of life.

Sliding-window RUL labels in pure NumPy
🐍rul_pairs_numpy.py
1import numpy as np

We will live in NumPy for this section. Everything below stays as plain ndarrays so the math is visible; the next subsection ports the same logic to PyTorch tensors and a Dataset.

EXECUTION STATE
numpy = Vectorised math library — same shape ops you will see on torch.Tensors later
4np.random.seed(7)

Lock the random state so the simulated engine is identical between runs. Reproducibility matters both for teaching and for the eventual paper-replication exercises in Chapter 26.

EXECUTION STATE
📚 np.random.seed = Sets the Mersenne-Twister seed for the global random stream
arg: 7 = Arbitrary fixed integer — any seed produces a deterministic stream
5n_cycles = 200

How many operating cycles this single engine survives before catastrophic failure.

EXECUTION STATE
n_cycles = 200 — slightly above the C-MAPSS FD001 mean (~206)
6n_sensors = 5

We pretend the engine carries five physical sensors. Real C-MAPSS has 21 sensors, of which 14 are informative; using five here keeps the matrices small enough to print.

EXECUTION STATE
n_sensors = 5 — chosen so a 30×5 window prints cleanly in the explorer above
8baseline = np.array([1.0, 2.0, 0.5, 1.5, 0.8])

The healthy-state mean of each sensor. Think of these as the readings on day one of the engine's life — vibration ≈ 1.0g, exhaust temp ≈ 2.0 (in normalised units), oil pressure ≈ 0.5, and so on.

EXECUTION STATE
📚 np.array = Builds an ndarray from a Python list. Elements share a single dtype (float64 here).
baseline.shape = (5,) — one number per sensor
baseline = [1.0, 2.0, 0.5, 1.5, 0.8]
9drift_dir = np.array([-0.5, +0.7, -0.3, +0.4, -0.6])

How much each sensor drifts away from its baseline by the moment of failure. Sign matters: vibration goes UP as bearings wear out, oil pressure goes DOWN as seals leak. The signs here mirror that intuition.

EXECUTION STATE
drift_dir = [-0.5, +0.7, -0.3, +0.4, -0.6]
interpretation = Sensor 1 ends ~0.5 below baseline; Sensor 2 ends ~0.7 above. Real C-MAPSS drift magnitudes have similar order.
10alpha = np.array([1.5, 2.0, 1.2, 1.8, 2.5])

Steepness exponents. With α > 1 the sensor stays near baseline for a long time and then accelerates as failure approaches — this is the classic 'P-F interval' shape from reliability engineering.

EXECUTION STATE
alpha = [1.5, 2.0, 1.2, 1.8, 2.5]
alpha = 1.0 = Linear drift (boring).
alpha = 2.0 = Quadratic — drift roughly t². At t=0.5 only 25% of total drift is applied; at t=0.9, 81%.
12cycles = np.arange(n_cycles)

Integer cycles 0, 1, 2, …, 199 — used as the time axis.

EXECUTION STATE
📚 np.arange(stop) = Returns [0, 1, ..., stop-1] as an ndarray. Equivalent to np.array(list(range(stop))).
cycles[:5] = [0, 1, 2, 3, 4]
cycles[-3:] = [197, 198, 199]
13t_norm = cycles / n_cycles

Normalise time to the unit interval [0, 1). Element-wise division: NumPy broadcasts the scalar 200 across the whole array.

EXECUTION STATE
t_norm[0] = 0.0 — birth
t_norm[100] = 0.5 — half-life
t_norm[199] = 0.995 — last cycle before failure
14sensors = np.zeros((n_cycles, n_sensors))

Pre-allocate the output matrix with the final shape so we can fill it column by column. Pre-allocating is the NumPy equivalent of asking a contractor for the floor plan before pouring concrete.

EXECUTION STATE
📚 np.zeros(shape) = Allocates an ndarray of given shape filled with 0.0 (float64 by default).
arg: shape=(n_cycles, n_sensors) = (200, 5) — one row per cycle, one column per sensor
sensors.shape = (200, 5)
15for s in range(n_sensors):

Iterate over the 5 sensor columns. We build the time series per sensor; vectorising over both axes at once is possible but harder to read.

LOOP TRACE · 5 iterations
s = 0 (vibration)
drift_dir[0], alpha[0] = -0.5, 1.5 — drifts down sub-quadratically
s = 1 (exhaust temp)
drift_dir[1], alpha[1] = +0.7, 2.0 — drifts up quadratically
s = 2 (oil pressure)
drift_dir[2], alpha[2] = -0.3, 1.2 — slow downward drift
s = 3 (fuel-flow ratio)
drift_dir[3], alpha[3] = +0.4, 1.8
s = 4 (fan-speed delta)
drift_dir[4], alpha[4] = -0.6, 2.5 — sharpest drift, latest
16sensors[:, s] = baseline[s] + drift_dir[s] * t_norm ** alpha[s] + 0.05 * np.random.randn(n_cycles)

Fills column s with a vectorised expression: baseline (scalar) + signed drift (1D array) + Gaussian noise (1D array). This is the deterministic-degradation-plus-noise model behind the chart you scrolled past.

EXECUTION STATE
sensors[:, s] = Full column slice — every cycle of sensor s
baseline[s] = Scalar — broadcasts onto the 200-element column
t_norm ** alpha[s] = Element-wise power. For sensor 1 (α=2.0): [0.0, 2.5e-5, 1.0e-4, …, 0.99]
📚 np.random.randn(n) = n samples from N(0, 1). Multiplied by 0.05 → N(0, 0.05²) noise.
result: sensors[0] = [1.085, 1.916, 0.497, 1.501, 0.805] — close to baseline
result: sensors[199] = [0.511, 2.747, 0.188, 1.939, 0.174] — far from baseline, near failure
18print("sensor matrix shape:", sensors.shape)

Sanity check.

EXECUTION STATE
Output = sensor matrix shape: (200, 5)
22WINDOW = 30

How many past cycles the model is allowed to see when predicting RUL. C-MAPSS papers almost universally use 30 because it captures roughly one engine sub-cycle of dynamics.

EXECUTION STATE
WINDOW = 30 — convention (paper §III-D, sequence length)
24def build_pairs(sensors, failure_cycle, window=WINDOW):

Takes one engine's full run-to-failure matrix and produces a stack of (window, sensors)-shaped inputs together with their scalar RUL targets.

EXECUTION STATE
input: sensors = (200, 5) array of all cycles' sensor readings
input: failure_cycle = 200 — needed because the function does not see beyond the data
input: window=30 = How many past cycles the model gets to see for one prediction
returns = (X, y) — X.shape = (n_pairs, window, n_sensors), y.shape = (n_pairs,)
26n = len(sensors)

Total cycles — 200 here. We will slide a window of size 30 across this length.

EXECUTION STATE
n = 200
27X_list, y_list = [], []

Two parallel Python lists; we append per-cycle one (input, target) pair to each.

EXECUTION STATE
X_list = Empty list — will hold (window, n_sensors)-shaped slices
y_list = Empty list — will hold scalar RUL values
28for end in range(window, n + 1):

`end` is the exclusive upper bound of the window. The first valid window is cycles [0, 30) and the last is cycles [170, 200). That gives 200 − 30 + 1 = 171 pairs.

LOOP TRACE · 5 iterations
end = 30
window slice = cycles [0, 30) — RUL = 200 − 30 = 170
end = 31
window slice = cycles [1, 31) — RUL = 169
end = 32
window slice = cycles [2, 32) — RUL = 168
(167 more iterations) = RUL counts down from 168 to 1
end = 200
window slice = cycles [170, 200) — RUL = 0 (last possible window)
29X_list.append(sensors[end - window:end])

Take a (window, n_sensors)-shaped slice and append. NumPy slicing is zero-copy — the appended object views the original array, so this loop is cheap.

EXECUTION STATE
📚 array[start:stop] = NumPy slice — returns a view, not a copy. Modifying the slice modifies the original. Use .copy() if you need independence.
Example: end = 30 = sensors[0:30] — first 30 rows, all 5 columns → shape (30, 5)
30y_list.append(failure_cycle - end)

RUL is the number of cycles between the last cycle in the window and the failure cycle. When end = 30, RUL = 170. When end = 200, RUL = 0 — the engine fails on the next cycle.

EXECUTION STATE
Example: end = 30 = failure_cycle (200) − end (30) = 170
Example: end = 199 = 200 − 199 = 1
Example: end = 200 = 200 − 200 = 0 — last possible target
31return np.stack(X_list), np.array(y_list)

np.stack glues the list of (30, 5) arrays into a single (171, 30, 5) tensor along a new leading axis. np.array converts the list of scalars into a (171,) ndarray.

EXECUTION STATE
📚 np.stack(arrays, axis=0) = Joins arrays along a NEW axis. Different from np.concatenate, which joins along an EXISTING axis.
np.stack(X_list).shape = (171, 30, 5) — (n_pairs, window, n_sensors)
np.array(y_list).shape = (171,)
33X, y = build_pairs(sensors, failure_cycle=n_cycles)

Single call returns the entire supervised dataset for this engine.

EXECUTION STATE
X = (171, 30, 5) ndarray — every input window
y = (171,) ndarray — the matching RUL targets, descending from 170 to 0
35print("X.shape:", X.shape, "y.shape:", y.shape)

Confirm the final shapes — these are the dimensions every chapter from here on quotes.

EXECUTION STATE
Output = X.shape: (171, 30, 5) y.shape: (171,)
36print("first pair RUL :", y[0])

First training pair — window covers cycles 0–29, RUL = 170 (still 170 cycles to go).

EXECUTION STATE
Output = first pair RUL : 170
37print("last pair RUL :", y[-1])

Last training pair — window covers cycles 170–199, RUL = 0 (engine fails immediately after).

EXECUTION STATE
Output = last pair RUL : 0
14 lines without explanation
1import numpy as np
2
3# ----- Step 1. Simulate one engine's run-to-failure trajectory -----
4np.random.seed(7)
5n_cycles  = 200          # this engine fails at cycle 200
6n_sensors = 5            # five physical sensors
7
8baseline  = np.array([1.0, 2.0, 0.5, 1.5, 0.8])     # nominal value of each sensor
9drift_dir = np.array([-0.5, +0.7, -0.3, +0.4, -0.6]) # which way each one drifts
10alpha     = np.array([1.5, 2.0, 1.2, 1.8, 2.5])     # how steep the drift becomes
11
12cycles  = np.arange(n_cycles)            # 0, 1, ..., 199
13t_norm  = cycles / n_cycles              # normalised time in [0, 1)
14sensors = np.zeros((n_cycles, n_sensors))
15for s in range(n_sensors):
16    sensors[:, s] = (baseline[s]
17                     + drift_dir[s] * t_norm ** alpha[s]
18                     + 0.05 * np.random.randn(n_cycles))
19
20print("sensor matrix shape:", sensors.shape)        # (200, 5)
21
22
23# ----- Step 2. Slice into supervised (X, y) training pairs -----
24WINDOW = 30
25
26def build_pairs(sensors, failure_cycle, window=WINDOW):
27    """One run-to-failure trajectory -> (X, y) regression pairs."""
28    n = len(sensors)
29    X_list, y_list = [], []
30    for end in range(window, n + 1):
31        X_list.append(sensors[end - window:end])    # past window of cycles
32        y_list.append(failure_cycle - end)          # cycles still to go
33    return np.stack(X_list), np.array(y_list)
34
35X, y = build_pairs(sensors, failure_cycle=n_cycles)
36
37print("X.shape:", X.shape, "y.shape:", y.shape)     # (171, 30, 5) (171,)
38print("first pair RUL :", y[0])                     # 170
39print("last  pair RUL :", y[-1])                    # 0

From one engine to a fleet

Real C-MAPSS gives you 100 engines in FD001 and 260 in FD002. The pattern stays identical — loop over engines, call build_pairs for each, concatenate the results. We will do exactly that in Chapter 7 once we have a real PyTorch Dataset.

PyTorch: The Same Idea, but as a Dataset

Now the same logic in PyTorch idiom. Two changes only: numpy.ndarray becomes torch.Tensor, and the loop becomes a Dataset subclass that DataLoader can batch and shuffle. The numerical output is identical.

The same windowing, now batched by PyTorch
🐍rul_pairs_torch.py
1import numpy as np

Still need NumPy because the input `sensors` is the (200, 5) ndarray we built in the previous block.

2import torch

PyTorch's top-level module. Provides torch.Tensor (the GPU-aware drop-in for ndarray), automatic differentiation, and the nn / optim sub-modules used in later chapters.

EXECUTION STATE
torch = Core library — tensors, autograd, device management
3from torch.utils.data import Dataset, DataLoader

Two abstractions every PyTorch training loop relies on. Dataset is a length-indexable container of (input, target) pairs; DataLoader wraps it with batching, shuffling, and multi-process workers.

EXECUTION STATE
📚 Dataset = Abstract base class. You override __len__ and __getitem__; PyTorch handles the rest.
📚 DataLoader = Iterable that pulls items from a Dataset and assembles them into mini-batches. Handles shuffling and parallel I/O for free.
5class EngineRunDataset(Dataset):

Wraps one engine's run-to-failure data into a PyTorch-compatible object. Identical contract to the build_pairs() function above — but now the (X, y) pairs are produced lazily as the DataLoader requests them.

EXECUTION STATE
Dataset (parent class) = PyTorch's abstract dataset; we inherit __getitem__ + __len__ machinery and the iter protocol.
8def __init__(self, sensors, failure_cycle, window=30):

Constructor — runs once, when the dataset object is created. Stores the data and pre-computes the list of valid window-start indices so __getitem__ is O(1).

EXECUTION STATE
input: sensors (np.ndarray) = (200, 5) — the run-to-failure trajectory
input: failure_cycle (int) = 200 — passed in because the dataset does not assume sensors ends exactly at failure
input: window (int) = 30 — same convention as the NumPy version
9super().__init__()

Calls Dataset's constructor. Today PyTorch's Dataset has an empty __init__, but always calling super() is good hygiene — future PyTorch releases may add setup logic.

10self.window = window

Stash the window size on the instance so __getitem__ can use it.

EXECUTION STATE
self.window = 30
11self.failure_cycle = failure_cycle

Stash the failure cycle for RUL computation in __getitem__.

EXECUTION STATE
self.failure_cycle = 200
13self.sensors = torch.from_numpy(sensors).float()

Convert the (200, 5) ndarray into a torch.Tensor exactly once. .float() casts to float32 — what every modern GPU expects. We could leave it on CPU; .to('cuda') would push it to the GPU.

EXECUTION STATE
📚 torch.from_numpy(arr) = Zero-copy bridge. The returned tensor SHARES MEMORY with the ndarray — modifying one modifies the other. Use .clone() to break the link.
📚 .float() = Casts dtype to torch.float32. NumPy arrays default to float64 (double precision); GPUs train ~2x faster in float32 with negligible accuracy loss for this problem.
self.sensors.shape = torch.Size([200, 5])
self.sensors.dtype = torch.float32
14self.starts = list(range(0, len(sensors) - window + 1))

Pre-compute the list of valid window starts: 0, 1, …, 170. There are 200 − 30 + 1 = 171 of them — exactly the number of (X, y) pairs we got out of build_pairs() earlier.

EXECUTION STATE
len(sensors) = 200
window = 30
len(self.starts) = 171 — number of training pairs in this engine
self.starts[:5] = [0, 1, 2, 3, 4]
self.starts[-3:] = [168, 169, 170]
16def __len__(self) -> int:

PyTorch's DataLoader calls len(dataset) to know how many items exist. Returning the length of self.starts is exactly what we want.

EXECUTION STATE
returns = 171 for this engine
17return len(self.starts)

171.

EXECUTION STATE
return value = 171
19def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:

Called every time the DataLoader needs the i-th sample. Must return a (X, y) tuple of tensors. PyTorch will collate B such tuples into one batch automatically.

EXECUTION STATE
input: idx (int) = Sample index in [0, len(ds)). DataLoader feeds sequential or shuffled idx values depending on its config.
returns = (X, y) — X is a (window, n_sensors) tensor, y is a 0-dim scalar tensor
20start = self.starts[idx]

Look up the pre-computed start cycle for this index.

EXECUTION STATE
Example: idx = 0 = start = 0
Example: idx = 170 = start = 170 — last valid window
21end = start + self.window

Exclusive upper bound of the window — same convention as the NumPy version.

EXECUTION STATE
Example: start = 0 = end = 30
Example: start = 170 = end = 200 — engine fails the next cycle
22X = self.sensors[start:end]

PyTorch tensor slicing — same syntax as NumPy. Returns a (window, n_sensors) view. No copy until DataLoader collates.

EXECUTION STATE
X.shape = torch.Size([30, 5])
📚 tensor slicing = Identical syntax to ndarray. Negative indices, step strides, and ellipsis (...) all work. Returns a view, not a copy.
23y = torch.tensor(float(self.failure_cycle - end))

Wrap the scalar RUL in a 0-dimensional tensor so the DataLoader can stack a batch of them into a 1-D tensor of shape (B,). Without this conversion a Python int would be returned and PyTorch would error during collation.

EXECUTION STATE
📚 torch.tensor(value) = Builds a tensor from a Python scalar / list / ndarray. Infers dtype from the argument — float for floats, int64 for ints. Always copies.
Example: end = 30 = y = torch.tensor(170.0) — RUL when window covers cycles 0..29
Example: end = 200 = y = torch.tensor(0.0) — last window before failure
24return X, y

Two-tuple — DataLoader's default collate_fn handles the rest.

EXECUTION STATE
return X = torch.Size([30, 5]), float32
return y = torch.Size([]), float32 (0-dim)
28ds = EngineRunDataset(sensors, failure_cycle=200, window=30)

Instantiate the dataset over the same NumPy sensors matrix produced earlier. Constructor cost is negligible — just pre-computes 171 start indices.

EXECUTION STATE
len(ds) = 171
ds[0] = (tensor of shape (30, 5), tensor 170.0) — the very first window/RUL pair
29loader = DataLoader(ds, batch_size=8, shuffle=False)

Wrap the dataset for batched iteration. shuffle=False keeps the cycle order intact; for real training in Chapter 15 we will turn it on.

EXECUTION STATE
📚 DataLoader(dataset, batch_size, shuffle, ...) = Builds an iterable that yields batches of size batch_size. With shuffle=True it randomly permutes the indices each epoch.
arg: batch_size = 8 = Each iteration produces 8 (X, y) pairs stacked along a new leading axis.
arg: shuffle = False = Visit samples in dataset order — easier to reason about for this teaching example.
31X_batch, y_batch = next(iter(loader))

Pull a single batch off the loader. iter(loader) returns a fresh iterator; next(...) grabs the first batch.

EXECUTION STATE
📚 iter(...) = Asks an iterable for its iterator object. DataLoader implements __iter__ so this works.
X_batch.shape = torch.Size([8, 30, 5]) — batch of 8 windows
y_batch.shape = torch.Size([8]) — batch of 8 scalar RUL targets
32print('batch X:', tuple(X_batch.shape))

Verify the leading batch dimension.

EXECUTION STATE
Output = batch X: (8, 30, 5)
33print('batch y:', tuple(y_batch.shape))

Confirms y is a 1-D tensor of length 8 — exactly what the loss function expects.

EXECUTION STATE
Output = batch y: (8,)
34print('y values:', y_batch.tolist())

The eight RUL targets in this batch are 170 down to 163 — consecutive cycles, exactly because shuffle=False.

EXECUTION STATE
Output = y values: [170.0, 169.0, 168.0, 167.0, 166.0, 165.0, 164.0, 163.0]
10 lines without explanation
1import numpy as np
2import torch
3from torch.utils.data import Dataset, DataLoader
4
5class EngineRunDataset(Dataset):
6    """One engine's run-to-failure trajectory wrapped as a PyTorch Dataset."""
7
8    def __init__(self, sensors: np.ndarray, failure_cycle: int, window: int = 30):
9        super().__init__()
10        self.window = window
11        self.failure_cycle = failure_cycle
12        # Move sensor matrix onto a tensor once; slicing later is free.
13        self.sensors = torch.from_numpy(sensors).float()
14        self.starts = list(range(0, len(sensors) - window + 1))
15
16    def __len__(self) -> int:
17        return len(self.starts)
18
19    def __getitem__(self, idx: int) -> tuple[torch.Tensor, torch.Tensor]:
20        start = self.starts[idx]
21        end   = start + self.window
22        X = self.sensors[start:end]                        # (window, n_sensors)
23        y = torch.tensor(float(self.failure_cycle - end))  # scalar RUL target
24        return X, y
25
26
27# ----- Use it -----
28ds = EngineRunDataset(sensors, failure_cycle=200, window=30)
29loader = DataLoader(ds, batch_size=8, shuffle=False)
30
31X_batch, y_batch = next(iter(loader))
32print("batch X:", tuple(X_batch.shape))   # (8, 30, 5)
33print("batch y:", tuple(y_batch.shape))   # (8,)
34print("y values:", y_batch.tolist())     # [170, 169, 168, 167, 166, 165, 164, 163]
Why bother with the Dataset class at all? Three reasons that only become obvious once you have hundreds of engines: (1) lazy slicing means you never materialise the full 171-pair array, (2) DataLoader gives you shuffling and parallel I/O for free, (3) Dataset composes — a ConcatDataset over 260 engines is a one-liner.

RUL Beyond Aerospace

The (window, sensor)-to-scalar regression formulation is dominant outside aerospace too. Whenever the failure mode is gradual, monotonic, and partially observable through sensor data, the same machinery applies. The label definition shifts because the “cycle” is replaced by whatever unit of life the equipment uses.

DomainUnit of lifeSensorsPublic benchmark
Turbofan engine (this book)Operating cyclePressure, temperature, fan-speed, fuel-flowNASA C-MAPSS, N-CMAPSS DS02
Lithium-ion batteryCharge cycleVoltage, current, temperature curvesNASA Battery, MIT/Stanford Severson 2019
Rolling-element bearingHours under loadVibration spectrum, acoustic emissionPRONOSTIA / FEMTO bearing dataset
Hard-disk driveOperating hoursSMART attributes (read errors, reallocations)Backblaze quarterly drive dump
Wind-turbine gearboxRotations / hoursVibration, oil debris, temperatureEDP Open Data, EngieWindFarm
Patient hospital stay (medical analogue)Days until discharge / decompensationVitals, labs, medication exposureMIMIC-IV ICU data
Software systemsCalls until crash / time-to-anomalyLatency percentiles, error rates, GC pauses(internal SRE telemetry)

Every row of that table consumes a window-shaped input and emits a scalar time-to-event. The CNN-BiLSTM-Attention backbone we build in Part III, the gradient-balancing loss in Part VI, the Pareto-frontier picture in Chapter 23 — all of them transfer with at most a renaming of variables.

The Censoring Pitfall

Every introduction to RUL skips the elephant in the corner: the only engines you can label are the ones that did fail. Engines pulled early, swapped out for unrelated reasons, or still happily running at the moment your dataset snapshot is taken provide censored observations — you know the engine survived past some cycle, but you do not know tfailt_{\text{fail}}.

Treating censored trajectories as if they had failed at the snapshot cycle biases the model toward shorter predicted RUL than reality. This is a well-known trap in survival analysis — the same trap that medical-trial statisticians solved with Kaplan-Meier and Cox proportional-hazards models in the 1970s.

C-MAPSS sidesteps the problem by being entirely run-to-failure simulation data, which is one of the reasons it has dominated the prognostic benchmark landscape since 2008. Real-world projects almost always need to either restrict the training set to fully-observed runs or borrow techniques from the survival literature. We flag it here, defer it to Chapter 29 (“Limitations & Open Research Questions”), and proceed with the un-censored idealisation for the next twenty-eight chapters.

The clean version of the problem. Run-to-failure trajectories with known failure cycle. Sliding-window inputs. Scalar RUL targets. A neural network fθf_\theta that maps one to the other. That is the world we will work in.

Takeaway

  • RUL is the fuel gauge for machines. Every model in this book, no matter how exotic its internals, ultimately emits one scalar per window: cycles to failure.
  • Supervised pairs come from sliding windows. One engine with 200 cycles of life and a window size of 30 yields 171 (input, target) pairs.
  • Python and PyTorch say the same thing differently. NumPy gives you the math directly; PyTorch Dataset/DataLoadergives you batching, shuffling, and parallel I/O for free.
  • Most real datasets are censored. C-MAPSS hides this behind simulation; outside it, censoring is the thing you have to engineer around.
Loading comments...