Chapter 5
20 min read
Section 26 of 104

PyTorch Implementation

CNN Feature Extractor

Learning Objectives

By the end of this section, you will:

  1. Implement a CNN block with Conv1D, BatchNorm, ReLU, and Dropout
  2. Build the complete feature extractor with three stacked blocks
  3. Trace the forward pass through all layers
  4. Verify parameter counts match theoretical expectations
  5. Prepare the output for integration with the BiLSTM
Why This Matters: This section translates all the concepts from this chapter into working PyTorch code. Understanding the implementation details is essential for debugging, modifying, and extending the model.

CNN Block Implementation

Each CNN block consists of four operations in sequence: convolution, batch normalization, activation, and dropout.

Block Structure

🐍python
1import torch
2import torch.nn as nn
3
4class CNNBlock(nn.Module):
5    """
6    A single CNN block: Conv1D -> BatchNorm -> ReLU -> Dropout
7
8    Applies 1D convolution with same padding to preserve sequence length,
9    followed by batch normalization for training stability,
10    ReLU activation for non-linearity,
11    and dropout for regularization.
12    """
13
14    def __init__(self, in_channels, out_channels, kernel_size=3, dropout=0.2):
15        super().__init__()
16
17        # Calculate padding for 'same' output length
18        # For kernel_size=3, padding=1 preserves length
19        padding = kernel_size // 2
20
21        self.conv = nn.Conv1d(
22            in_channels=in_channels,
23            out_channels=out_channels,
24            kernel_size=kernel_size,
25            padding=padding,
26            bias=False  # BatchNorm has its own bias
27        )
28
29        self.bn = nn.BatchNorm1d(out_channels)
30        self.relu = nn.ReLU(inplace=True)
31        self.dropout = nn.Dropout(p=dropout)
32
33    def forward(self, x):
34        """
35        Forward pass through the block.
36
37        Args:
38            x: Input tensor of shape (batch, time, channels)
39
40        Returns:
41            Output tensor of shape (batch, time, out_channels)
42        """
43        # Conv1d expects (batch, channels, time)
44        x = x.transpose(1, 2)
45
46        x = self.conv(x)
47        x = self.bn(x)
48        x = self.relu(x)
49        x = self.dropout(x)
50
51        # Return to (batch, time, channels) format
52        x = x.transpose(1, 2)
53        return x

Key Implementation Details

AspectChoiceRationale
bias=False in Conv1dNo bias termBatchNorm provides equivalent shift via Ξ²
padding=kernel_size//2Same paddingPreserves sequence length T
inplace=True in ReLUMemory efficientOverwrites input tensor
Transpose operationsShape adaptationConv1d expects (B, C, T) format

PyTorch Shape Convention

PyTorch's Conv1d expects input shape (batch, channels, length). Our data is (batch, time, features). We transpose before convolution and after to maintain consistency with the rest of the model.


Complete Feature Extractor

The feature extractor stacks three CNN blocks with the channel progression 17 β†’ 64 β†’ 128 β†’ 64.

Full Implementation

🐍python
1class CNNFeatureExtractor(nn.Module):
2    """
3    Three-layer CNN feature extractor for time series.
4
5    Transforms raw sensor features into rich representations
6    suitable for sequence modeling with BiLSTM.
7
8    Architecture:
9        Block 1: 17 -> 64 channels (expand)
10        Block 2: 64 -> 128 channels (expand further)
11        Block 3: 128 -> 64 channels (compress)
12
13    Input: (batch, 30, 17) - 30 timesteps, 17 sensor features
14    Output: (batch, 30, 64) - 30 timesteps, 64 learned features
15    """
16
17    def __init__(self, input_dim=17, hidden_dim=64, kernel_size=3, dropout=0.2):
18        super().__init__()
19
20        self.input_dim = input_dim
21        self.hidden_dim = hidden_dim
22
23        # Block 1: Expand from input_dim to hidden_dim
24        self.block1 = CNNBlock(
25            in_channels=input_dim,     # 17
26            out_channels=hidden_dim,    # 64
27            kernel_size=kernel_size,
28            dropout=dropout
29        )
30
31        # Block 2: Expand further to 2x hidden_dim
32        self.block2 = CNNBlock(
33            in_channels=hidden_dim,         # 64
34            out_channels=hidden_dim * 2,    # 128
35            kernel_size=kernel_size,
36            dropout=dropout
37        )
38
39        # Block 3: Compress back to hidden_dim
40        self.block3 = CNNBlock(
41            in_channels=hidden_dim * 2,  # 128
42            out_channels=hidden_dim,      # 64
43            kernel_size=kernel_size,
44            dropout=dropout
45        )
46
47    def forward(self, x):
48        """
49        Extract features from sensor time series.
50
51        Args:
52            x: Input tensor of shape (batch, seq_len, input_dim)
53               Example: (32, 30, 17)
54
55        Returns:
56            Features tensor of shape (batch, seq_len, hidden_dim)
57               Example: (32, 30, 64)
58        """
59        x = self.block1(x)  # (B, 30, 17) -> (B, 30, 64)
60        x = self.block2(x)  # (B, 30, 64) -> (B, 30, 128)
61        x = self.block3(x)  # (B, 30, 128) -> (B, 30, 64)
62
63        return x

Optional: Residual Connections

For deeper networks or improved gradient flow, residual connections can be added:

🐍python
1class ResidualCNNBlock(nn.Module):
2    """CNN block with optional residual connection."""
3
4    def __init__(self, in_channels, out_channels, kernel_size=3, dropout=0.2):
5        super().__init__()
6
7        self.block = CNNBlock(in_channels, out_channels, kernel_size, dropout)
8
9        # Projection for dimension mismatch
10        self.use_projection = (in_channels != out_channels)
11        if self.use_projection:
12            self.projection = nn.Conv1d(in_channels, out_channels, kernel_size=1)
13
14    def forward(self, x):
15        identity = x
16        out = self.block(x)
17
18        if self.use_projection:
19            # Project identity to match output dimensions
20            identity = identity.transpose(1, 2)
21            identity = self.projection(identity)
22            identity = identity.transpose(1, 2)
23
24        return out + identity

Forward Pass Analysis

Let us trace a batch of data through the CNN feature extractor to understand the transformations.

Step-by-Step Trace

🐍python
1# Create model
2cnn = CNNFeatureExtractor(input_dim=17, hidden_dim=64)
3
4# Input batch: 32 samples, 30 timesteps, 17 features
5x = torch.randn(32, 30, 17)
6print(f"Input shape: {x.shape}")  # (32, 30, 17)
7
8# Block 1
9x1 = cnn.block1(x)
10print(f"After Block 1: {x1.shape}")  # (32, 30, 64)
11
12# Block 2
13x2 = cnn.block2(x1)
14print(f"After Block 2: {x2.shape}")  # (32, 30, 128)
15
16# Block 3
17x3 = cnn.block3(x2)
18print(f"After Block 3: {x3.shape}")  # (32, 30, 64)
19
20# Final output
21output = cnn(x)
22print(f"Final output: {output.shape}")  # (32, 30, 64)

Dimension Flow Table

StageShapeDescription
Input(32, 30, 17)Raw sensor windows
Block 1 input (transposed)(32, 17, 30)For Conv1d
After Conv1d(32, 64, 30)64 feature maps
After BatchNorm(32, 64, 30)Normalized
After ReLU(32, 64, 30)Non-linearity applied
After Dropout(32, 64, 30)Regularized
Block 1 output (transposed)(32, 30, 64)Back to (B, T, C)
Block 2 output(32, 30, 128)Expanded channels
Block 3 output(32, 30, 64)Compressed for LSTM

Parameter Count Verification

Let us verify our parameter count calculations match the implementation.

Theoretical Count

ComponentParametersCalculation
Block 1 Conv3,26417 Γ— 64 Γ— 3 = 3,264 (no bias)
Block 1 BN12864 Γ— 2 (Ξ³ and Ξ²)
Block 2 Conv24,57664 Γ— 128 Γ— 3 = 24,576
Block 2 BN256128 Γ— 2
Block 3 Conv24,576128 Γ— 64 Γ— 3 = 24,576
Block 3 BN12864 Γ— 2
Total52,928Sum of all learnable parameters

PyTorch Verification

🐍python
1def count_parameters(model):
2    """Count total trainable parameters."""
3    return sum(p.numel() for p in model.parameters() if p.requires_grad)
4
5cnn = CNNFeatureExtractor(input_dim=17, hidden_dim=64)
6total_params = count_parameters(cnn)
7print(f"Total parameters: {total_params:,}")  # 52,928
8
9# Breakdown by block
10for name, module in cnn.named_children():
11    params = count_parameters(module)
12    print(f"{name}: {params:,} parameters")
13
14# Output:
15# block1: 3,392 parameters (3,264 conv + 128 bn)
16# block2: 24,832 parameters (24,576 conv + 256 bn)
17# block3: 24,704 parameters (24,576 conv + 128 bn)

Running Statistics

BatchNorm also maintains running_mean and running_var buffers, but these are not learnable parametersβ€”they are updated via exponential moving average during training and used during inference.


Integration with BiLSTM

The CNN feature extractor's output feeds directly into the BiLSTM layer.

Interface

🐍python
1class AMNLModel(nn.Module):
2    """
3    Complete AMNL architecture.
4    CNN -> BiLSTM -> Attention -> Prediction heads
5    """
6
7    def __init__(self, config):
8        super().__init__()
9
10        # CNN Feature Extractor
11        self.cnn = CNNFeatureExtractor(
12            input_dim=config.input_dim,      # 17
13            hidden_dim=config.cnn_hidden,    # 64
14            kernel_size=config.kernel_size,  # 3
15            dropout=config.cnn_dropout       # 0.2
16        )
17
18        # BiLSTM for temporal modeling
19        self.lstm = nn.LSTM(
20            input_size=config.cnn_hidden,    # 64 (CNN output)
21            hidden_size=config.lstm_hidden,  # 128
22            num_layers=config.lstm_layers,   # 2
23            batch_first=True,
24            bidirectional=True,
25            dropout=config.lstm_dropout      # 0.3
26        )
27
28        # ... attention and prediction heads
29
30    def forward(self, x):
31        # Extract local features
32        cnn_features = self.cnn(x)  # (B, 30, 64)
33
34        # Model temporal dependencies
35        lstm_out, _ = self.lstm(cnn_features)  # (B, 30, 256)
36
37        # ... attention and prediction
38        return predictions

Data Flow

πŸ“text
1Input: (batch, 30, 17)
2         ↓
3   CNN Feature Extractor
4         ↓
5CNN Output: (batch, 30, 64)
6         ↓
7      BiLSTM
8         ↓
9LSTM Output: (batch, 30, 256)  ← 128 Γ— 2 (bidirectional)
10         ↓
11   Attention Layer
12         ↓
13Context: (batch, 256)
14         ↓
15   Prediction Heads

Why This Interface Works

  • Preserved sequence length: CNN outputs 30 timesteps, same as input
  • Reduced feature dimension: 64 CNN features vs 17 raw sensorsβ€”more informative, more compact
  • Local patterns extracted: CNN features encode degradation signatures
  • Ready for temporal modeling: LSTM processes the feature sequence

Summary

In this section, we implemented the complete CNN feature extractor:

  1. CNNBlock: Conv1D β†’ BatchNorm β†’ ReLU β†’ Dropout
  2. CNNFeatureExtractor: Three blocks with 17 β†’ 64 β†’ 128 β†’ 64 channels
  3. Forward pass: Preserves sequence length, transforms features
  4. Parameters: ~53K trainable parameters
  5. Integration: Output shape (B, 30, 64) feeds into BiLSTM
PropertyValue
Input shape(batch, 30, 17)
Output shape(batch, 30, 64)
Number of blocks3
Total parameters~53,000
Kernel size3
Dropout rate0.2
Chapter Summary: We have built a complete CNN feature extractor that transforms raw sensor readings into rich feature representations. The architecture uses three convolutional blocks with batch normalization for training stability and dropout for regularization. The output is ready for temporal modeling with the BiLSTM, which we will implement in the next chapter.

With the CNN feature extractor complete, we now move to Chapter 6 where we implement the BiLSTM layer for temporal sequence modeling.