Two Stability Knobs
Stacking three Conv1D layers without regularisation is a recipe for unstable training: activations explode or vanish, and the model overfits to noise patterns in individual channels. Two techniques applied at every conv block do most of the work.
| Technique | What it fixes | Cost |
|---|---|---|
| BatchNorm1d | Activation magnitude drift; slow training | +2 params per channel; tiny FLOPs |
| Dropout | Co-adaptation; brittle channels | Zero extra params; ~5% throughput hit |
BatchNorm: Per-Channel Whitening
BatchNorm1d normalises each channel's activations to mean 0 and unit variance, then applies a learnable per-channel scale and shift:
Two regimes:
| Mode | Statistics used | Updates |
|---|---|---|
| Training | Current batch's mean / var | Updates running averages used at eval |
| Evaluation | Running mean / var | No updates; deterministic per input |
Dropout: Randomly Forgetting
Dropout zeroes each activation with probability during training, then scales the survivors by so expected output magnitude is unchanged. We use in the conv blocks - small because the input is a 30-cycle window. Heavier dropout () lives in the FC stack at the end of the backbone (Chapter 11).
Python: BN and Dropout in 10 Lines
PyTorch: nn.BatchNorm1d and nn.Dropout
The .train() / .eval() Distinction
BatchNorm and Dropout are the two layers in the book where this distinction matters. Forgetting to call model.eval() before validation is among the most common bugs in PyTorch code.
| State | BatchNorm uses | Dropout |
|---|---|---|
| .train() | Current batch stats | Active (drops + scales) |
| .eval() | Running stats | Identity (passes through) |
with torch.no_grad(): block AND set model.eval() before scoring.Three BN/Dropout Pitfalls
state_dict handles them; manual parameter iteration loses them and eval-mode breaks.The point. BN keeps activations on a well-conditioned scale; dropout keeps the network from over-relying on any single channel. Both are mode-dependent.
Takeaway
- BN: per-channel whitening with learnable gamma / beta. Train uses batch stats; eval uses running stats.
- Dropout: zero out 15% of activations during training. Inverted variant scales survivors so eval is identity.
- Always toggle .train() / .eval() correctly. Otherwise BN drifts and dropout corrupts validation.
- BN buffers travel with state_dict. Manual serialisation can lose them.