Learning Objectives
By the end of this section, you will:
- Understand the DataLoader's role in connecting Dataset to training
- Choose optimal batch size balancing memory and gradient quality
- Implement proper shuffling for training vs evaluation
- Configure parallel data loading for GPU utilization
- Set up train/validation/test loaders with appropriate settings
Why This Matters: Poor DataLoader configuration can bottleneck training, leaving your GPU idle while waiting for data. Or improper shuffling can introduce subtle bugs. This section ensures your data pipeline is optimized for efficient model training.
DataLoader Role in Training
The DataLoader wraps a Dataset and provides iteration with:
| Feature | Purpose | Configured By |
|---|---|---|
| Batching | Group samples into batches | batch_size |
| Shuffling | Randomize sample order | shuffle |
| Parallel loading | Load data in background | num_workers |
| Memory pinning | Faster CPUโGPU transfer | pin_memory |
| Drop last | Handle incomplete batches | drop_last |
Basic Usage
1from torch.utils.data import DataLoader
2
3train_loader = DataLoader(
4 dataset=train_dataset,
5 batch_size=32,
6 shuffle=True,
7 num_workers=4,
8 pin_memory=True
9)
10
11for batch_idx, (windows, ruls, healths) in enumerate(train_loader):
12 # windows: (32, 30, 17)
13 # ruls: (32,)
14 # healths: (32,)
15 outputs = model(windows)
16 loss = compute_loss(outputs, ruls, healths)Batch Size Selection
Batch size affects training dynamics, memory usage, and convergence.
Trade-offs
| Batch Size | Pros | Cons |
|---|---|---|
| Small (8-16) | Low memory, noisy gradients (regularizing) | Slow training, unstable |
| Medium (32-64) | Balanced memory and speed | Good default choice |
| Large (128-256) | Faster epochs, stable gradients | High memory, may generalize worse |
| Very large (512+) | Fastest per epoch | Often requires learning rate tuning |
Memory Calculation
Our Choice: batch_size = 32
Rationale:
- Fits comfortably in GPU memory with room for larger models
- 32 samples provide reasonably stable gradient estimates
- Standard choice, enabling fair comparison with prior work
- Good balance of training speed and generalization
Shuffling Strategy
Shuffling randomizes the order of samples each epoch. The strategy differs between training and evaluation.
Training: Always Shuffle
1train_loader = DataLoader(
2 train_dataset,
3 batch_size=32,
4 shuffle=True # Random order each epoch
5)Benefits of shuffling:
- Prevents ordering bias: Model doesn't learn spurious patterns from data order
- Better gradient diversity: Each batch has varied samples
- Implicit regularization: Different batch compositions each epoch
Validation/Test: No Shuffle
1val_loader = DataLoader(
2 val_dataset,
3 batch_size=32,
4 shuffle=False # Consistent order
5)
6
7test_loader = DataLoader(
8 test_dataset,
9 batch_size=32,
10 shuffle=False # Consistent order
11)Why no shuffling for evaluation:
- Reproducibility: Same predictions each run
- Debugging: Easier to trace specific sample results
- No effect on metrics: Evaluation metrics are order-independent
Validation Shuffle Bug
A common mistake: shuffling validation data changes which samples appear in each batch for running average metrics. With shuffle=True, your validation loss can vary between runs even with the same modelโmasking true improvements.
Parallel Data Loading
The num_workers parameter enables parallel data loading in background processes.
How It Works
1Main Process: [Train Step 1] [Train Step 2] [Train Step 3] ...
2 โ โ โ
3Worker 1: [Load B2] [Load B5] [Load B8]
4Worker 2: [Load B3] [Load B6] [Load B9]
5Worker 3: [Load B4] [Load B7] [Load B10]
6
7Batch B1 loaded during initialization
8Workers prefetch future batches during trainingWhile the GPU trains on batch N, workers load batches N+1, N+2, etc.
Choosing num_workers
| num_workers | Behavior | Use Case |
|---|---|---|
| 0 | Main process loads data | Debugging, Windows default |
| 1-2 | Light parallelism | Small datasets, limited CPU |
| 4 | Good default | Most cases |
| 8+ | Heavy parallelism | Large datasets, many CPU cores |
Our choice: num_workers = 4
For C-MAPSS, data loading is fast (small dataset, no heavy transforms). With 4 workers, data is always ready when the GPU needs it.
pin_memory for GPU Training
1train_loader = DataLoader(
2 dataset,
3 batch_size=32,
4 num_workers=4,
5 pin_memory=True # Faster CPUโGPU transfer
6)pin_memory=True allocates data in page-locked (pinned) memory, enabling faster transfer to GPU. Always use when training on GPU.
Worker Initialization
Each worker initializes its own copy of the dataset. For large datasets stored in memory, this can multiply memory usage. Use memory mapping or shared memory for very large datasets.
Complete Configuration
Putting it all together, here is our complete DataLoader setup:
Training Loader
1train_loader = DataLoader(
2 dataset=train_dataset,
3 batch_size=32,
4 shuffle=True, # Randomize order each epoch
5 num_workers=4, # Parallel loading
6 pin_memory=True, # Faster GPU transfer
7 drop_last=True # Consistent batch size
8)Validation Loader
1val_loader = DataLoader(
2 dataset=val_dataset,
3 batch_size=32,
4 shuffle=False, # Consistent order
5 num_workers=4,
6 pin_memory=True,
7 drop_last=False # Evaluate all samples
8)Test Loader
1test_loader = DataLoader(
2 dataset=test_dataset,
3 batch_size=1, # One engine at a time
4 shuffle=False,
5 num_workers=0, # Simple for inference
6 pin_memory=True
7)Factory Function
1def create_dataloaders(train_data, val_data, test_data,
2 batch_size=32, num_workers=4):
3 """
4 Create train, validation, and test DataLoaders.
5
6 Args:
7 train_data: Tuple of (windows, rul_labels, health_labels)
8 val_data: Same format
9 test_data: Same format
10 batch_size: Samples per batch
11 num_workers: Parallel loading workers
12
13 Returns:
14 train_loader, val_loader, test_loader
15 """
16 train_dataset = CMAPSSDataset(*train_data)
17 val_dataset = CMAPSSDataset(*val_data)
18 test_dataset = CMAPSSDataset(*test_data)
19
20 train_loader = DataLoader(
21 train_dataset,
22 batch_size=batch_size,
23 shuffle=True,
24 num_workers=num_workers,
25 pin_memory=True,
26 drop_last=True
27 )
28
29 val_loader = DataLoader(
30 val_dataset,
31 batch_size=batch_size,
32 shuffle=False,
33 num_workers=num_workers,
34 pin_memory=True
35 )
36
37 test_loader = DataLoader(
38 test_dataset,
39 batch_size=1,
40 shuffle=False,
41 num_workers=0,
42 pin_memory=True
43 )
44
45 return train_loader, val_loader, test_loaderConfiguration Summary
| Parameter | Train | Validation | Test |
|---|---|---|---|
| batch_size | 32 | 32 | 1 |
| shuffle | True | False | False |
| num_workers | 4 | 4 | 0 |
| pin_memory | True | True | True |
| drop_last | True | False | False |
Summary
In this section, we configured efficient DataLoaders for training:
- DataLoader role: Batching, shuffling, parallel loading
- Batch size = 32: Balanced memory and gradient quality
- Shuffling: True for training, False for evaluation
- Parallel loading: num_workers = 4, pin_memory = True
- drop_last: True for training (consistent batch size)
| Setting | Value | Rationale |
|---|---|---|
| batch_size | 32 | Memory fits, stable gradients |
| num_workers | 4 | Data always ready for GPU |
| pin_memory | True | Faster CPUโGPU transfer |
| shuffle (train) | True | Prevent ordering bias |
Chapter Summary: We have now built a complete, production-quality data preprocessing pipeline: per-condition normalization removes regime effects while preserving degradation signals, leakage prevention ensures valid evaluation, sliding windows create fixed-size inputs, and the PyTorch Dataset/DataLoader infrastructure efficiently serves data to our model. In Chapter 5, we begin building the model itself, starting with the CNN feature extractor.
With the data pipeline complete, we are ready to implement the neural network architecture.