Learning Objectives
By the end of this section, you will:
- Understand why normalization is essential for neural network training
- Identify the failure mode of global normalization on multi-condition datasets
- Derive the per-condition normalization strategy that preserves degradation signals
- Implement condition-aware preprocessing for FD002 and FD004
- Validate that normalization preserves information while removing regime effects
Why This Matters: Normalization is often treated as a routine preprocessing step. But for C-MAPSS datasets with multiple operating conditions, the wrong normalization strategy destroys degradation signals. Per-condition normalization is essential for state-of-the-art performance on FD002 and FD004.
Why Normalization Matters
Neural networks are sensitive to the scale of input features. Without normalization, training can fail or converge slowly.
The Scale Problem
Consider the raw sensor ranges in C-MAPSS:
| Sensor | Typical Range | Magnitude |
|---|---|---|
| T30 (Temperature) | 1550 - 1620 °R | ~1600 |
| P30 (Pressure) | 545 - 560 psia | ~550 |
| Nc (Core speed) | 9000 - 9200 rpm | ~9000 |
| BPR (Bypass ratio) | 8.3 - 8.5 | ~8 |
| phi (Fuel ratio) | 518 - 525 | ~520 |
Features with large values (Nc ≈ 9000) would dominate gradients over features with small values (BPR ≈ 8).
What Normalization Achieves
Standard normalization transforms each feature to zero mean and unit variance:
This achieves:
- Equal feature contribution: All features have comparable scale
- Faster convergence: Gradient descent operates efficiently
- Numerical stability: Avoids overflow/underflow in activations
Global Normalization Problem
The naive approach computes mean and standard deviation over all data:
For FD001 and FD003 (single operating condition), this works fine. But for FD002 and FD004, it fails catastrophically.
The Multi-Modal Distribution Problem
Visualizing the Problem
Global normalization on multi-modal data produces a strange distribution:
1Before normalization: After global normalization:
2
3Condition 1: [518-520] Condition 1: [+0.9 to +1.0]
4Condition 2: [490-492] Condition 2: [+0.0 to +0.1]
5Condition 3: [470-472] --> Condition 3: [-0.6 to -0.5]
6Condition 4: [480-482] Condition 4: [-0.3 to -0.2]
7Condition 5: [448-450] Condition 5: [-1.4 to -1.3]
8Condition 6: [440-442] Condition 6: [-1.6 to -1.5]
9
10Degradation signal (~2°R change within condition) is invisible!Silent Failure
The dangerous aspect: the model will still train and produce predictions! But it learns to distinguish conditions, not degradation. Performance on FD002/FD004 will be significantly worse than possible.
Per-Condition Normalization
The solution is simple: compute normalization statistics separately for each operating condition.
Mathematical Formulation
Let denote operating conditions (C = 6 for FD002/FD004). For each condition and each feature:
Then normalize each sample using statistics from its condition:
What This Achieves
Condition Identification
To apply per-condition normalization, we must identify which condition each sample belongs to. In C-MAPSS, conditions are determined by the three operating settings (columns 3-5):
| Condition | Altitude | Mach | TRA |
|---|---|---|---|
| 1 | 0 | 0.0 | 100 |
| 2 | 10000 | 0.25 | 100 |
| 3 | 20000 | 0.70 | 100 |
| 4 | 25000 | 0.62 | 60 |
| 5 | 35000 | 0.84 | 100 |
| 6 | 42000 | 0.84 | 100 |
We cluster samples by their operating settings to assign condition labels.
Implementation Details
Step 1: Identify Operating Conditions
1def identify_conditions(data, op_cols=[0, 1, 2]):
2 """
3 Cluster samples by operating condition.
4
5 Args:
6 data: Raw data array with operating settings
7 op_cols: Column indices for operating settings
8
9 Returns:
10 Condition labels for each sample
11 """
12 # Round operating settings to handle numerical precision
13 op_settings = np.round(data[:, op_cols], decimals=2)
14
15 # Find unique conditions
16 unique_conditions = np.unique(op_settings, axis=0)
17
18 # Assign condition labels
19 conditions = np.zeros(len(data), dtype=int)
20 for i, cond in enumerate(unique_conditions):
21 mask = np.all(op_settings == cond, axis=1)
22 conditions[mask] = i
23
24 return conditionsStep 2: Compute Per-Condition Statistics
1def compute_condition_stats(data, conditions, feature_cols):
2 """
3 Compute mean and std for each feature within each condition.
4
5 Returns:
6 means: Dict mapping condition -> array of means
7 stds: Dict mapping condition -> array of stds
8 """
9 means = {}
10 stds = {}
11
12 for cond in np.unique(conditions):
13 mask = conditions == cond
14 cond_data = data[mask][:, feature_cols]
15
16 means[cond] = np.mean(cond_data, axis=0)
17 stds[cond] = np.std(cond_data, axis=0)
18
19 # Prevent division by zero
20 stds[cond] = np.maximum(stds[cond], 1e-8)
21
22 return means, stdsStep 3: Apply Normalization
1def normalize_per_condition(data, conditions, means, stds, feature_cols):
2 """
3 Apply per-condition normalization.
4
5 Each sample is normalized using statistics from its condition.
6 """
7 normalized = data.copy()
8
9 for cond in np.unique(conditions):
10 mask = conditions == cond
11 normalized[mask, feature_cols] = (
12 (data[mask, feature_cols] - means[cond]) / stds[cond]
13 )
14
15 return normalizedHandling FD001 and FD003
For single-condition datasets (FD001, FD003), per-condition normalization reduces to global normalization—there is only one condition:
Our implementation handles both cases uniformly.
Validating the Approach
How do we verify that per-condition normalization works correctly?
Check 1: Zero Mean Within Conditions
After normalization, each feature should have approximately zero mean within each condition:
Check 2: Unit Variance Within Conditions
Check 3: Condition-Independent Degradation
The key validation: degradation trends should look similar across conditions after normalization.
Quantitative Validation
| Normalization | FD002 RMSE | FD004 RMSE | Improvement |
|---|---|---|---|
| None (raw) | 45+ | 55+ | Baseline |
| Global | 28-32 | 35-40 | Moderate |
| Per-condition | 18-22 | 20-25 | Best |
Published Results
Many papers reporting on FD002/FD004 use global normalization. Our per-condition approach is one reason AMNL achieves state-of-the-art results.
Summary
In this section, we developed the per-condition normalization strategy:
- Normalization importance: Equal feature scales, faster convergence, numerical stability
- Global normalization problem: Multi-modal distributions encode condition, not degradation
- Per-condition solution: Compute for each condition
- Normalization formula:
- Validation: Zero mean, unit variance within conditions; aligned degradation trends
| Aspect | Global | Per-Condition |
|---|---|---|
| Statistics | Single μ, σ | One μ, σ per condition |
| Condition effect | Preserved | Removed |
| Degradation signal | Obscured | Preserved |
| FD002/FD004 performance | Poor | Excellent |
Looking Ahead: Normalization is just one way to accidentally corrupt data. A more subtle danger is data leakage—when test information inadvertently influences training. In the next section, we will identify leakage sources and implement safeguards.
With per-condition normalization understood, we are ready to address the critical issue of data leakage prevention.