Learning Objectives
By the end of this section, you will:
- Implement a CNN block with Conv1D, BatchNorm, ReLU, and Dropout
- Build the complete feature extractor with three stacked blocks
- Trace the forward pass through all layers
- Verify parameter counts match theoretical expectations
- Prepare the output for integration with the BiLSTM
Why This Matters: This section translates all the concepts from this chapter into working PyTorch code. Understanding the implementation details is essential for debugging, modifying, and extending the model.
CNN Block Implementation
Each CNN block consists of four operations in sequence: convolution, batch normalization, activation, and dropout.
Block Structure
1import torch
2import torch.nn as nn
3
4class CNNBlock(nn.Module):
5 """
6 A single CNN block: Conv1D -> BatchNorm -> ReLU -> Dropout
7
8 Applies 1D convolution with same padding to preserve sequence length,
9 followed by batch normalization for training stability,
10 ReLU activation for non-linearity,
11 and dropout for regularization.
12 """
13
14 def __init__(self, in_channels, out_channels, kernel_size=3, dropout=0.2):
15 super().__init__()
16
17 # Calculate padding for 'same' output length
18 # For kernel_size=3, padding=1 preserves length
19 padding = kernel_size // 2
20
21 self.conv = nn.Conv1d(
22 in_channels=in_channels,
23 out_channels=out_channels,
24 kernel_size=kernel_size,
25 padding=padding,
26 bias=False # BatchNorm has its own bias
27 )
28
29 self.bn = nn.BatchNorm1d(out_channels)
30 self.relu = nn.ReLU(inplace=True)
31 self.dropout = nn.Dropout(p=dropout)
32
33 def forward(self, x):
34 """
35 Forward pass through the block.
36
37 Args:
38 x: Input tensor of shape (batch, time, channels)
39
40 Returns:
41 Output tensor of shape (batch, time, out_channels)
42 """
43 # Conv1d expects (batch, channels, time)
44 x = x.transpose(1, 2)
45
46 x = self.conv(x)
47 x = self.bn(x)
48 x = self.relu(x)
49 x = self.dropout(x)
50
51 # Return to (batch, time, channels) format
52 x = x.transpose(1, 2)
53 return xKey Implementation Details
| Aspect | Choice | Rationale |
|---|---|---|
| bias=False in Conv1d | No bias term | BatchNorm provides equivalent shift via Ξ² |
| padding=kernel_size//2 | Same padding | Preserves sequence length T |
| inplace=True in ReLU | Memory efficient | Overwrites input tensor |
| Transpose operations | Shape adaptation | Conv1d expects (B, C, T) format |
PyTorch Shape Convention
PyTorch's Conv1d expects input shape (batch, channels, length). Our data is (batch, time, features). We transpose before convolution and after to maintain consistency with the rest of the model.
Complete Feature Extractor
The feature extractor stacks three CNN blocks with the channel progression 17 β 64 β 128 β 64.
Full Implementation
1class CNNFeatureExtractor(nn.Module):
2 """
3 Three-layer CNN feature extractor for time series.
4
5 Transforms raw sensor features into rich representations
6 suitable for sequence modeling with BiLSTM.
7
8 Architecture:
9 Block 1: 17 -> 64 channels (expand)
10 Block 2: 64 -> 128 channels (expand further)
11 Block 3: 128 -> 64 channels (compress)
12
13 Input: (batch, 30, 17) - 30 timesteps, 17 sensor features
14 Output: (batch, 30, 64) - 30 timesteps, 64 learned features
15 """
16
17 def __init__(self, input_dim=17, hidden_dim=64, kernel_size=3, dropout=0.2):
18 super().__init__()
19
20 self.input_dim = input_dim
21 self.hidden_dim = hidden_dim
22
23 # Block 1: Expand from input_dim to hidden_dim
24 self.block1 = CNNBlock(
25 in_channels=input_dim, # 17
26 out_channels=hidden_dim, # 64
27 kernel_size=kernel_size,
28 dropout=dropout
29 )
30
31 # Block 2: Expand further to 2x hidden_dim
32 self.block2 = CNNBlock(
33 in_channels=hidden_dim, # 64
34 out_channels=hidden_dim * 2, # 128
35 kernel_size=kernel_size,
36 dropout=dropout
37 )
38
39 # Block 3: Compress back to hidden_dim
40 self.block3 = CNNBlock(
41 in_channels=hidden_dim * 2, # 128
42 out_channels=hidden_dim, # 64
43 kernel_size=kernel_size,
44 dropout=dropout
45 )
46
47 def forward(self, x):
48 """
49 Extract features from sensor time series.
50
51 Args:
52 x: Input tensor of shape (batch, seq_len, input_dim)
53 Example: (32, 30, 17)
54
55 Returns:
56 Features tensor of shape (batch, seq_len, hidden_dim)
57 Example: (32, 30, 64)
58 """
59 x = self.block1(x) # (B, 30, 17) -> (B, 30, 64)
60 x = self.block2(x) # (B, 30, 64) -> (B, 30, 128)
61 x = self.block3(x) # (B, 30, 128) -> (B, 30, 64)
62
63 return xOptional: Residual Connections
For deeper networks or improved gradient flow, residual connections can be added:
1class ResidualCNNBlock(nn.Module):
2 """CNN block with optional residual connection."""
3
4 def __init__(self, in_channels, out_channels, kernel_size=3, dropout=0.2):
5 super().__init__()
6
7 self.block = CNNBlock(in_channels, out_channels, kernel_size, dropout)
8
9 # Projection for dimension mismatch
10 self.use_projection = (in_channels != out_channels)
11 if self.use_projection:
12 self.projection = nn.Conv1d(in_channels, out_channels, kernel_size=1)
13
14 def forward(self, x):
15 identity = x
16 out = self.block(x)
17
18 if self.use_projection:
19 # Project identity to match output dimensions
20 identity = identity.transpose(1, 2)
21 identity = self.projection(identity)
22 identity = identity.transpose(1, 2)
23
24 return out + identityForward Pass Analysis
Let us trace a batch of data through the CNN feature extractor to understand the transformations.
Step-by-Step Trace
1# Create model
2cnn = CNNFeatureExtractor(input_dim=17, hidden_dim=64)
3
4# Input batch: 32 samples, 30 timesteps, 17 features
5x = torch.randn(32, 30, 17)
6print(f"Input shape: {x.shape}") # (32, 30, 17)
7
8# Block 1
9x1 = cnn.block1(x)
10print(f"After Block 1: {x1.shape}") # (32, 30, 64)
11
12# Block 2
13x2 = cnn.block2(x1)
14print(f"After Block 2: {x2.shape}") # (32, 30, 128)
15
16# Block 3
17x3 = cnn.block3(x2)
18print(f"After Block 3: {x3.shape}") # (32, 30, 64)
19
20# Final output
21output = cnn(x)
22print(f"Final output: {output.shape}") # (32, 30, 64)Dimension Flow Table
| Stage | Shape | Description |
|---|---|---|
| Input | (32, 30, 17) | Raw sensor windows |
| Block 1 input (transposed) | (32, 17, 30) | For Conv1d |
| After Conv1d | (32, 64, 30) | 64 feature maps |
| After BatchNorm | (32, 64, 30) | Normalized |
| After ReLU | (32, 64, 30) | Non-linearity applied |
| After Dropout | (32, 64, 30) | Regularized |
| Block 1 output (transposed) | (32, 30, 64) | Back to (B, T, C) |
| Block 2 output | (32, 30, 128) | Expanded channels |
| Block 3 output | (32, 30, 64) | Compressed for LSTM |
Parameter Count Verification
Let us verify our parameter count calculations match the implementation.
Theoretical Count
| Component | Parameters | Calculation |
|---|---|---|
| Block 1 Conv | 3,264 | 17 Γ 64 Γ 3 = 3,264 (no bias) |
| Block 1 BN | 128 | 64 Γ 2 (Ξ³ and Ξ²) |
| Block 2 Conv | 24,576 | 64 Γ 128 Γ 3 = 24,576 |
| Block 2 BN | 256 | 128 Γ 2 |
| Block 3 Conv | 24,576 | 128 Γ 64 Γ 3 = 24,576 |
| Block 3 BN | 128 | 64 Γ 2 |
| Total | 52,928 | Sum of all learnable parameters |
PyTorch Verification
1def count_parameters(model):
2 """Count total trainable parameters."""
3 return sum(p.numel() for p in model.parameters() if p.requires_grad)
4
5cnn = CNNFeatureExtractor(input_dim=17, hidden_dim=64)
6total_params = count_parameters(cnn)
7print(f"Total parameters: {total_params:,}") # 52,928
8
9# Breakdown by block
10for name, module in cnn.named_children():
11 params = count_parameters(module)
12 print(f"{name}: {params:,} parameters")
13
14# Output:
15# block1: 3,392 parameters (3,264 conv + 128 bn)
16# block2: 24,832 parameters (24,576 conv + 256 bn)
17# block3: 24,704 parameters (24,576 conv + 128 bn)Running Statistics
BatchNorm also maintains running_mean and running_var buffers, but these are not learnable parametersβthey are updated via exponential moving average during training and used during inference.
Integration with BiLSTM
The CNN feature extractor's output feeds directly into the BiLSTM layer.
Interface
1class AMNLModel(nn.Module):
2 """
3 Complete AMNL architecture.
4 CNN -> BiLSTM -> Attention -> Prediction heads
5 """
6
7 def __init__(self, config):
8 super().__init__()
9
10 # CNN Feature Extractor
11 self.cnn = CNNFeatureExtractor(
12 input_dim=config.input_dim, # 17
13 hidden_dim=config.cnn_hidden, # 64
14 kernel_size=config.kernel_size, # 3
15 dropout=config.cnn_dropout # 0.2
16 )
17
18 # BiLSTM for temporal modeling
19 self.lstm = nn.LSTM(
20 input_size=config.cnn_hidden, # 64 (CNN output)
21 hidden_size=config.lstm_hidden, # 128
22 num_layers=config.lstm_layers, # 2
23 batch_first=True,
24 bidirectional=True,
25 dropout=config.lstm_dropout # 0.3
26 )
27
28 # ... attention and prediction heads
29
30 def forward(self, x):
31 # Extract local features
32 cnn_features = self.cnn(x) # (B, 30, 64)
33
34 # Model temporal dependencies
35 lstm_out, _ = self.lstm(cnn_features) # (B, 30, 256)
36
37 # ... attention and prediction
38 return predictionsData Flow
1Input: (batch, 30, 17)
2 β
3 CNN Feature Extractor
4 β
5CNN Output: (batch, 30, 64)
6 β
7 BiLSTM
8 β
9LSTM Output: (batch, 30, 256) β 128 Γ 2 (bidirectional)
10 β
11 Attention Layer
12 β
13Context: (batch, 256)
14 β
15 Prediction HeadsWhy This Interface Works
- Preserved sequence length: CNN outputs 30 timesteps, same as input
- Reduced feature dimension: 64 CNN features vs 17 raw sensorsβmore informative, more compact
- Local patterns extracted: CNN features encode degradation signatures
- Ready for temporal modeling: LSTM processes the feature sequence
Summary
In this section, we implemented the complete CNN feature extractor:
- CNNBlock: Conv1D β BatchNorm β ReLU β Dropout
- CNNFeatureExtractor: Three blocks with 17 β 64 β 128 β 64 channels
- Forward pass: Preserves sequence length, transforms features
- Parameters: ~53K trainable parameters
- Integration: Output shape (B, 30, 64) feeds into BiLSTM
| Property | Value |
|---|---|
| Input shape | (batch, 30, 17) |
| Output shape | (batch, 30, 64) |
| Number of blocks | 3 |
| Total parameters | ~53,000 |
| Kernel size | 3 |
| Dropout rate | 0.2 |
Chapter Summary: We have built a complete CNN feature extractor that transforms raw sensor readings into rich feature representations. The architecture uses three convolutional blocks with batch normalization for training stability and dropout for regularization. The output is ready for temporal modeling with the BiLSTM, which we will implement in the next chapter.
With the CNN feature extractor complete, we now move to Chapter 6 where we implement the BiLSTM layer for temporal sequence modeling.