Learning Objectives
By the end of this section, you will:
- Understand why sliding windows are needed for variable-length time series
- Master the actual research implementation of sequence construction with RUL calculation
- Handle train vs test data correctly with different RUL computation strategies
- Assign labels correctly to each extracted window
- Track unit IDs for proper per-engine evaluation
Why This Matters: Neural networks require fixed-size inputs, but engine trajectories vary from 128 to 362 cycles. The_build_sequences_and_labelsmethod is the heart of our preprocessingβit extracts sliding windows, computes piecewise RUL, and tracks unit IDs for proper evaluation.
Why Sliding Windows?
Engine trajectories in C-MAPSS have variable lengths:
| Engine | Trajectory Length | Problem |
|---|---|---|
| Engine 1 | 192 cycles | Can't batch with Engine 2 |
| Engine 2 | 287 cycles | Different tensor shape |
| Engine 3 | 145 cycles | Padding wastes computation |
| Engine 4 | 362 cycles | Longest in dataset |
The Fixed-Input Requirement
Our model expects tensors of shape :
- : Batch size (samples per batch)
- : Sequence length (fixed window size)
- : Feature dimension
Sliding windows solve this by extracting fixed-length subsequences from each trajectory, converting variable-length engines into uniform training samples.
Window Intuition
1Trajectory: [cβ, cβ, cβ, cβ, cβ
, cβ, cβ, cβ, cβ, cββ] (10 cycles)
2Window size: L = 3
3Stride: S = 1
4
5Window 1: [cβ, cβ, cβ] β RUL label from cβ
6Window 2: [cβ, cβ, cβ] β RUL label from cβ
7Window 3: [cβ, cβ, cβ
] β RUL label from cβ
8...
9Window 8: [cβ, cβ, cββ] β RUL label from cββ
10
11Total: 8 windows from 10-cycle trajectoryWindow Construction Algorithm
Mathematical Definition
For a trajectory of length , window size , and stride , the number of windows is:
Window (0-indexed) spans cycles:
And the RUL label is taken from the last cycle in the window:
Research Implementation
This is the actual _build_sequences_and_labels method from our research code. It handles both training and test data with different RUL computation strategies.
Key Implementation Details
| Aspect | Implementation | Rationale |
|---|---|---|
| Stride | S = 1 (implicit) | Maximum training samples |
| Label position | End of window (i+L-1) | Predict current RUL from history |
| RUL clipping | [0, 125] | Piecewise-linear RUL assumption |
| Unit ID tracking | Store with each window | Enable per-engine evaluation |
| Empty handling | Return empty arrays | Robust to edge cases |
RUL Calculation Strategy
A critical distinction exists between how RUL is computed for training vs test data:
Training Data
For training data, we know the exact failure point (last cycle in trajectory):
The clip(lower=0, upper=125) implements the piecewise linear assumption: during early operation, RUL is capped at 125.
Test Data
For test data, the true RUL at the final cycle is provided in RUL_FD00X.txt files:
This back-propagates the ground truth through the entire test trajectory.
Why Different Strategies?
Training engines run to failure (RUL=0 at end). Test engines are stopped mid-operationβwe need the RUL_FD00X.txt file to know how many cycles remained. This simulates real-world prediction where engines haven't failed yet.
Window Parameters
Window Size (L = 30)
| Window Size | Context | Trade-off |
|---|---|---|
| L = 10 | Short-term patterns only | May miss long-range trends |
| L = 30 | Balanced (our choice) | Good context, efficient |
| L = 50 | Long-term context | Fewer samples, more memory |
| L = 100 | Very long context | May exceed trajectory length |
Rationale for L = 30:
- Captures ~30 flight cycles of context (approximately one month of operation)
- Long enough to see degradation trends, short enough for efficiency
- All training engines have at least 128 cycles, so 30-cycle windows always fit
- Consistent with prior work, enabling fair comparison
Sample Count Analysis
Summary
In this section, we explored the actual research implementation of sliding window sequence construction:
- Why windows: Convert variable-length trajectories to fixed-size inputs (30, 17)
- Research implementation:
_build_sequences_and_labels()handles both train and test data - RUL calculation: Different strategies for training (max_cycle - current) vs test (back-propagate from ground truth)
- Label assignment: RUL at window end (last timestep)
- Unit ID tracking: Essential for per-engine evaluation metrics
| Output | Shape | Description |
|---|---|---|
| X (sequences) | (N, 30, 17) | Sliding window features |
| y (RUL labels) | (N,) | RUL at end of each window |
| unit_ids | (N,) | Engine ID for each window |
Looking Ahead: We now have sequences and labels. The next section implements the complete PyTorch Dataset class that wraps this preprocessing and integrates with DataLoader for training.
With sequence construction understood, we are ready to implement the PyTorch data pipeline.