A Time Series Is a Stack of Vectors
Open a music player and look at the spectrum analyser. At every instant, a bar chart of frequency bins flashes by — one vector per millisecond. A whole song is hundreds of thousands of such vectors stacked along time. An ECG trace from a hospital monitor is the same idea with twelve channels (leads) instead of frequency bins. A turbofan engine running for an hour is again the same idea, this time with seventeen sensor channels and one sample per cycle.
Mathematically these are all the same object: a sequence of vectors with each . The two natural numbers that describe a single time series are (its length) and (its feature count).
From Stack to Tensor
A tensor is just a multi-dimensional array. The 3-D sensor tensor we operate on is
with element giving the value of the -th feature, at the -th cycle, of the -th engine in the batch. Three axes, one number per cell.
| Axis | Name | Typical size | What it means |
|---|---|---|---|
| Batch | 256 (training) / 1 (inference) | Independent engines processed in parallel | |
| Time | 30 (C-MAPSS) / 50 (N-CMAPSS) | Cycles inside the sliding window | |
| Feature | 17 (C-MAPSS) / 28 (N-CMAPSS DS02) | Sensors + engineered channels |
Three letters, every operation in the book reduces to manipulating them. Convolutions slide along ; LSTMs unroll along ; attention compares pairs along ; the final regression head collapses entirely. grows and shrinks across layers. is the only axis that almost never changes between the input and the output.
Interactive: The (B, T, F) Anatomy
The diagram below renders a tensor as a stack of feature-by-time slabs, one slab per engine in the batch. Toggle which axis to highlight; the captions tell you the shape and physical meaning of every slice you can produce. With the same picture has 130,560 cells — same idea, more numbers.
Three quick exercises while the diagram is in front of you. (1) Set : one engine, the picture collapses to a single slab and the diagram becomes a pure 2-D matrix. (2) Set : every column has only one cell — a single cycle's reading per engine, useful for inspecting the very last cycle of a window. (3) Highlight the feature axis: you isolate one sensor across the entire batch and time horizon — this is exactly what selects.
Python: Build a Tensor in NumPy
In thirty lines of NumPy we build a (B, T, F) tensor from scratch, take slices along each axis, and confirm that reductions drop one dimension at a time. Every shape printed below is a shape you will see again in Chapter 7 when we wrap real C-MAPSS data into a Dataset.
One line to remember
X.mean(axis=k) drops axis k. The output shape is . This is the recipe behind the average-pooling, the global-pooling, and the per-feature-mean operations that appear later in the model.
PyTorch: The Same Idea, on the GPU
PyTorch's torch.Tensor is the same object with two super-powers: it can live on a GPU, and it tracks gradients for autograd. The slicing syntax is identical to NumPy. The reductions take dim= instead of axis= — otherwise the API is a drop-in replacement.
torch.from_numpy(arr) is zero-copy — the resulting tensor and the original ndarray share memory. Update one, the other changes too. Use this in DataLoaders to avoid duplicating large arrays.Broadcasting: The One Rule You Cannot Skip
Broadcasting is what lets you write X - mean_per_feature when and . The shape rule is mechanical: align dimensions from the right; each pair must either be equal or one of them must be 1; missing leading dimensions count as 1.
| Operation | Left shape | Right shape | Result | Why |
|---|---|---|---|---|
| X - mean_per_feature | (4, 30, 17) | (17,) | (4, 30, 17) | Right is broadcast to (1, 1, 17) |
| X - mean_per_window | (4, 30, 17) | (4, 1, 17) | (4, 30, 17) | Size-1 time axis broadcasts |
| X - mean_per_engine | (4, 30, 17) | (4, 1, 1) | (4, 30, 17) | Both inner dims broadcast |
| X * weights | (4, 30, 17) | (30, 1) | ERROR | Trailing dims (17 vs 1) and (30 vs 30) misalign |
.unsqueeze(dim) and .expand(...) are the explicit workarounds.(B, T, F) in Other ML Domains
The shape is not a prognostics invention — it is the dominant representation across deep learning for sequential data. Names change; the shape persists.
| Domain | B | T | F | Example |
|---|---|---|---|---|
| RUL prediction (this book) | Engines per batch | Cycles in window | 17 sensors | (256, 30, 17) |
| NLP transformers | Sentences per batch | Token positions | Embedding dim | (64, 512, 768) |
| Speech recognition | Utterances | Audio frames (10ms) | Mel filterbanks | (32, 1500, 80) |
| Heart-rate analysis | Patients | Sampling timesteps | Lead channels | (16, 5000, 12) |
| Stock trading | Tickers | Trading minutes | OHLCV + indicators | (500, 240, 20) |
| Climate forecasting | Grid cells | Forecast hours | Temp, pressure, humidity, wind | (2048, 96, 5) |
| Activity recognition | Wearer windows | Accelerometer samples | x, y, z + gyro | (64, 200, 6) |
Every model architecture in this book — CNN, BiLSTM, attention, dual-task heads — works because it commits to as the contract. The same architectures retarget to any of the rows above by changing a few hyperparameters.
The Three Shape Pitfalls
.unsqueeze(-1).unsqueeze(-1) or .view(B, 1, 1) to add the missing axes.The chapter's mantra. Three letters. One shape. Everything else is detail.
Takeaway
- A multivariate time series is a stack of vectors. Two numbers describe one series: (length), (features).
- Add a batch axis and you have a tensor. Every operation in this book consumes and produces another tensor whose shape derives from it.
- Slicing collapses one axis at a time.
X[0]drops batch,X[:, 0]drops time,X[:, :, 0]drops features. - Reductions take dim= (PyTorch) or axis= (NumPy).
X.mean(dim=1)averages over time and returns shape . - Broadcasting aligns from the right. Internalise this rule once and most shape-mismatch errors become obvious.