From Edge Detectors to Sensor Streams
Open any introductory image-processing book and the first algorithm you meet is the Sobel edge detector: a tiny 3 × 3 weight matrix that, when slid across an image, lights up wherever brightness changes sharply. Gravitational-wave astronomers do the same with matched filters: a known waveform template is correlated against a noisy receiver stream, with peaks marking detections. Speech-recognition front-ends use 1-D filters to find vowel onsets in a microphone signal. These are the same operation: take a small set of weights, slide it along a signal, sum the products at every position.
Applied to a turbofan engine's sensor stream, the operation is called a 1D convolution — the workhorse of the first layer of every CNN-based prognostic model in this book. The kernel learns to detect local degradation patterns: sudden spikes, gradual trends, oscillations. The same architectural primitive that detects edges in photographs detects bearing wear in vibration signals.
The 1D Convolution Operation
Given an input signal and a kernel , the (cross-) correlation that PyTorch and most deep-learning libraries call “convolution” is
Three knobs control the geometry of the operation: kernel size , stride , and padding . The output length is
| Symbol | Meaning | Common choice |
|---|---|---|
| Input sequence length | 30 (C-MAPSS window) | |
| Kernel size (receptive field) | 3 or 5 | |
| Zero padding on each side | 1 (for K = 3, 'same') | |
| Stride | 1 (no downsampling) | |
| Output length | 30 with same padding |
With our default , , : — the time axis is preserved layer to layer, which is what we want when we stack three conv layers and then hand off to the BiLSTM.
Output length = . The kernel is a weighted local average; the output emphasises the centre cycle of each window.
Interactive: Watch the Kernel Slide
The visualization below uses a real C-MAPSS sensor (T30, total temperature at HPC outlet) and a 3-tap kernel. Press play and watch the kernel walk across the input; pause to inspect the weighted-sum at any position. The padding toggle shows you exactly what the zero-pad cycles look like at the edges.
Interactive 1D Convolution Visualizer
Understanding nn.Conv1d(input_size, 64, kernel_size=3, padding=1)
What happens when we declare this line?
input_size = Number of input channels (17 sensors in C-MAPSS)64 = Output channels (64 learned feature detectors)kernel_size=3 = Window looks at 3 consecutive timestepspadding=1 = Add zeros at boundaries to preserve length1D Convolution Equation
yt = Σk=0K-1 wk · xt+k + b
- • K = kernel size (3 in our case)
- • w = learned weights
- • b = bias term
- • t = output position
Output Dimension Formula
Tout = ⌊(Tin + 2P - K) / S⌋ + 1
With Tin=8, P=1, K=3, S=1:
Tout = ⌊(8 + 2 - 3) / 1⌋ + 1 = 8
Padding preserves sequence length!
Parameter Count
For Conv1d(17, 64, kernel_size=3):
Weights = 64 × 17 × 3 = 3,264
Biases = 64
Total = 3,328 parameters
What the Kernel Learns
The kernel weights are learned during training. Different patterns emerge:
- [1, 0, -1] → Detects rising/falling edges
- [0.33, 0.33, 0.33] → Smoothing/averaging
- [−1, 2, −1] → Detects spikes
64 different kernels learn 64 different patterns!
Two things to take away. First, the output value at any position only depends on cycles , , — the receptive field. Cycles outside that window cannot affect ; this is why Section 9 will follow the conv with a BiLSTM that integrates over the whole window. Second, the same kernel applies at every position — the conv layer is translation-invariant. A spike at cycle 5 and a spike at cycle 25 both produce the same output magnitude; the model doesn't need to learn to detect spikes twice.
Multi-Channel: 17 Sensors at Once
Real sensor data has channels (informative C-MAPSS sensors). The 1D convolution generalises trivially: each output channel uses a separate kernel that spans all input channels, then sums the contributions:
Each output channel is the weighted sum of all input channels across the kernel's temporal window. With , , the layer holds weights plus 64 biases — 3,328 learnable parameters in one layer.
| Layer | Input shape (B, T, C) | Output shape | Params |
|---|---|---|---|
| Conv1D #1 | (B, 30, 17) | (B, 30, 64) | 64*17*3 + 64 = 3,328 |
| Conv1D #2 | (B, 30, 64) | (B, 30, 128) | 128*64*3 + 128 = 24,704 |
| Conv1D #3 | (B, 30, 128) | (B, 30, 64) | 64*128*3 + 64 = 24,640 |
| Total | ~ 52,672 |
Interactive: Multi-Channel in Detail
The next visualization steps through a stacked 8 → 16 → 8 architecture. Click any output cell at any layer and the diagram shows you which input cells contributed to it — you can literally see the receptive field grow as you move up the stack.
Multi-Channel 1D Convolution + ReLU
Understanding how Conv1d processes multiple input channels to produce multiple output channels and ReLU activation
Two-Layer CNN Architecture: 8 → 16 → 8 channels (with ReLU)
Click on Conv1, Conv2, or ReLU to see detailed computation
Input: 8 Sensors × 6 Timesteps
After Conv1+ReLU: 16 Features × 6 Timesteps
After Conv2+ReLU: 8 Features × 6 Timesteps
Input: 8 Sensors × 6 Timesteps
After Conv1+ReLU: 16 Features
After Conv2+ReLU: 8 Features
Key Insight: Multi-Channel Convolution
Each output channel is computed by summing contributions from ALL input channels:
Python: 1D Convolution From Scratch
Twenty-five lines of NumPy and the operation is fully transparent. We define conv1d_naive, run it on the 5-sample toy from above to verify the hand-computed numbers, then apply two well-known hand-crafted kernels — an edge detector and a three-tap smoother — to the same signal.
Verifying the hand-computation
The valid output matches the three-line worked example earlier in the section to the digit. The same kernel with padding emits five values instead of three — the input-length-preserving choice that lets us stack many conv layers without losing time-axis cycles.
PyTorch: nn.Conv1d (and the Axis Trap)
nn.Conv1d expects input shape — channels SECOND, time LAST. Our CMAPSSDataset emits . If you forget the.transpose(1, 2) bridge, the layer will silently treat your time axis as channels and your sensors as a tiny temporal window — the loss will go down, the accuracy will not, and you will spend a week debugging. Always transpose.With that out of the way, the entire idiomatic PyTorch implementation is one nn.Conv1d instantiation plus the two transposes:
What Kernels Actually Learn
Hand-coding kernels (Sobel, smoothing, Gaussian) is the classical approach. Modern deep learning learns kernels from data through back-propagation. Once trained, the learned weights tend to look like recognisable pattern detectors:
| Pattern | Kernel approximation | What it fires on |
|---|---|---|
| Edge / spike | [-1, +2, -1] | Centre cycle higher than its neighbours |
| Gradient / trend | [-1, 0, +1] | Increasing values from left to right |
| Smoothing | [1/3, 1/3, 1/3] | Local mean - removes high-frequency noise |
| Difference | [ 0, +1, -1] | Cycle-to-cycle delta |
| Wide trend | [-1, -1, 0, +1, +1] | Slow upward drift over 5 cycles |
Three layers of these stacked become a hierarchy — layer 1 detects edges and gradients, layer 2 combines them into compound features (“spike followed by decay”), layer 3 produces high-level degradation signatures. Section 8 visualises exactly this hierarchy on a trained C-MAPSS model.
1D Convolution Beyond RUL
The same nine lines of code show up everywhere a model needs to detect local patterns in a 1-D signal. Each row in the table below is solved with an architecture that is, modulo the loader, identical to ours.
| Domain | Signal | What conv1d detects | Famous architecture |
|---|---|---|---|
| RUL (this book) | 17 engine sensors | Local degradation patterns | CNN-BiLSTM-Attention |
| Audio recognition | Mel-spectrogram | Phoneme onsets, formants | WaveNet / DeepSpeech |
| Music generation | Raw audio waveform | Pitched events, percussion | WaveNet / SampleRNN |
| Genomics | DNA bases (one-hot) | Motifs, regulatory elements | DeepBind / DeepSEA |
| ECG analysis | 12-lead voltage trace | QRS complexes, arrhythmia | ResNet-1D for cardiology |
| Gravitational waves | LIGO strain data | Compact-binary inspiral chirps | Matched filter / 1D ResNet |
| Network traffic | Bytes-per-second | Anomalous spikes, DDoS onsets | 1D CNN intrusion detection |
| Industrial vibration | Accelerometer trace | Bearing fault frequencies | 1D CNN + envelope spectrum |
The mathematical machinery in this book transfers to every row of that table by changing only the loader and the input dimensions. The attention mechanism in Section 3.4, the loss function in Section 14, the gradient balancer in Section 18 — all applicable wherever a 1D conv frontend is the right entry point.
The Three Pitfalls
print(x.shape) immediately before any nn.Conv1d.in_channels with batch size or kernel size. It is the feature dimension — the number of sensors per timestep. For multi-sensor data it is never 1.The pattern. A 1D convolution slides a learned kernel along the time axis and asks the same local question at every cycle. Stacking layers builds a hierarchy of questions. Coupling it with the BiLSTM in Section 3.3 gives the model both local pattern detection and long-range temporal dynamics — the architecture every model in this book uses.
Takeaway
- 1D convolution is one equation. — a weighted sum over a sliding window.
- Output length is mechanical. . Use for same padding.
- Multi-channel is the same with one more sum. Each output channel sums contributions from all input channels and all kernel positions: weights per layer.
- PyTorch wants (B, C, T). Bridge from with
x.transpose(1, 2)beforenn.Conv1d; transpose back after. - Translation invariance is free. The same kernel applies at every cycle — no need to teach the model about position. The BiLSTM in Section 3.3 will add position-aware temporal modelling on top.