Chapter 4
15 min read
Section 17 of 104

Per-Condition Normalization Strategy

Data Preprocessing Pipeline

Learning Objectives

By the end of this section, you will:

  1. Understand why normalization is essential for neural network training
  2. Identify the failure mode of global normalization on multi-condition datasets
  3. Derive the per-condition normalization strategy that preserves degradation signals
  4. Implement condition-aware preprocessing for FD002 and FD004
  5. Validate that normalization preserves information while removing regime effects
Why This Matters: Normalization is often treated as a routine preprocessing step. But for C-MAPSS datasets with multiple operating conditions, the wrong normalization strategy destroys degradation signals. Per-condition normalization is essential for state-of-the-art performance on FD002 and FD004.

Why Normalization Matters

Neural networks are sensitive to the scale of input features. Without normalization, training can fail or converge slowly.

The Scale Problem

Consider the raw sensor ranges in C-MAPSS:

SensorTypical RangeMagnitude
T30 (Temperature)1550 - 1620 °R~1600
P30 (Pressure)545 - 560 psia~550
Nc (Core speed)9000 - 9200 rpm~9000
BPR (Bypass ratio)8.3 - 8.5~8
phi (Fuel ratio)518 - 525~520

Features with large values (Nc ≈ 9000) would dominate gradients over features with small values (BPR ≈ 8).

What Normalization Achieves

Standard normalization transforms each feature to zero mean and unit variance:

xnorm=xμσx_{\text{norm}} = \frac{x - \mu}{\sigma}

This achieves:

  • Equal feature contribution: All features have comparable scale
  • Faster convergence: Gradient descent operates efficiently
  • Numerical stability: Avoids overflow/underflow in activations

Global Normalization Problem

The naive approach computes mean and standard deviation over all data:

μglobal=1Ni=1Nxi,σglobal=1Ni=1N(xiμglobal)2\mu_{\text{global}} = \frac{1}{N} \sum_{i=1}^{N} x_i, \quad \sigma_{\text{global}} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu_{\text{global}})^2}

For FD001 and FD003 (single operating condition), this works fine. But for FD002 and FD004, it fails catastrophically.

The Multi-Modal Distribution Problem

Visualizing the Problem

Global normalization on multi-modal data produces a strange distribution:

📝text
1Before normalization:              After global normalization:
2
3Condition 1: [518-520]            Condition 1: [+0.9 to +1.0]
4Condition 2: [490-492]            Condition 2: [+0.0 to +0.1]
5Condition 3: [470-472]     -->    Condition 3: [-0.6 to -0.5]
6Condition 4: [480-482]            Condition 4: [-0.3 to -0.2]
7Condition 5: [448-450]            Condition 5: [-1.4 to -1.3]
8Condition 6: [440-442]            Condition 6: [-1.6 to -1.5]
9
10Degradation signal (~2°R change within condition) is invisible!

Silent Failure

The dangerous aspect: the model will still train and produce predictions! But it learns to distinguish conditions, not degradation. Performance on FD002/FD004 will be significantly worse than possible.


Per-Condition Normalization

The solution is simple: compute normalization statistics separately for each operating condition.

Mathematical Formulation

Let c{1,2,,C}c \in \{1, 2, \ldots, C\} denote operating conditions (C = 6 for FD002/FD004). For each condition and each feature:

μj(c)=1Nci:condi=cxi,j\mu_j^{(c)} = \frac{1}{N_c} \sum_{i: \text{cond}_i = c} x_{i,j}
σj(c)=1Nci:condi=c(xi,jμj(c))2\sigma_j^{(c)} = \sqrt{\frac{1}{N_c} \sum_{i: \text{cond}_i = c} (x_{i,j} - \mu_j^{(c)})^2}

Then normalize each sample using statistics from its condition:

xi,jnorm=xi,jμj(condi)σj(condi)x_{i,j}^{\text{norm}} = \frac{x_{i,j} - \mu_j^{(\text{cond}_i)}}{\sigma_j^{(\text{cond}_i)}}

What This Achieves

Condition Identification

To apply per-condition normalization, we must identify which condition each sample belongs to. In C-MAPSS, conditions are determined by the three operating settings (columns 3-5):

ConditionAltitudeMachTRA
100.0100
2100000.25100
3200000.70100
4250000.6260
5350000.84100
6420000.84100

We cluster samples by their operating settings to assign condition labels.


Implementation Details

Step 1: Identify Operating Conditions

🐍python
1def identify_conditions(data, op_cols=[0, 1, 2]):
2    """
3    Cluster samples by operating condition.
4
5    Args:
6        data: Raw data array with operating settings
7        op_cols: Column indices for operating settings
8
9    Returns:
10        Condition labels for each sample
11    """
12    # Round operating settings to handle numerical precision
13    op_settings = np.round(data[:, op_cols], decimals=2)
14
15    # Find unique conditions
16    unique_conditions = np.unique(op_settings, axis=0)
17
18    # Assign condition labels
19    conditions = np.zeros(len(data), dtype=int)
20    for i, cond in enumerate(unique_conditions):
21        mask = np.all(op_settings == cond, axis=1)
22        conditions[mask] = i
23
24    return conditions

Step 2: Compute Per-Condition Statistics

🐍python
1def compute_condition_stats(data, conditions, feature_cols):
2    """
3    Compute mean and std for each feature within each condition.
4
5    Returns:
6        means: Dict mapping condition -> array of means
7        stds: Dict mapping condition -> array of stds
8    """
9    means = {}
10    stds = {}
11
12    for cond in np.unique(conditions):
13        mask = conditions == cond
14        cond_data = data[mask][:, feature_cols]
15
16        means[cond] = np.mean(cond_data, axis=0)
17        stds[cond] = np.std(cond_data, axis=0)
18
19        # Prevent division by zero
20        stds[cond] = np.maximum(stds[cond], 1e-8)
21
22    return means, stds

Step 3: Apply Normalization

🐍python
1def normalize_per_condition(data, conditions, means, stds, feature_cols):
2    """
3    Apply per-condition normalization.
4
5    Each sample is normalized using statistics from its condition.
6    """
7    normalized = data.copy()
8
9    for cond in np.unique(conditions):
10        mask = conditions == cond
11        normalized[mask, feature_cols] = (
12            (data[mask, feature_cols] - means[cond]) / stds[cond]
13        )
14
15    return normalized

Handling FD001 and FD003

For single-condition datasets (FD001, FD003), per-condition normalization reduces to global normalization—there is only one condition:

C=1μ(1)=μglobal,σ(1)=σglobalC = 1 \Rightarrow \mu^{(1)} = \mu_{\text{global}}, \quad \sigma^{(1)} = \sigma_{\text{global}}

Our implementation handles both cases uniformly.


Validating the Approach

How do we verify that per-condition normalization works correctly?

Check 1: Zero Mean Within Conditions

After normalization, each feature should have approximately zero mean within each condition:

E[xjnormcond=c]0c,j\mathbb{E}[x_{j}^{\text{norm}} \mid \text{cond} = c] \approx 0 \quad \forall c, j

Check 2: Unit Variance Within Conditions

Var[xjnormcond=c]1c,j\text{Var}[x_{j}^{\text{norm}} \mid \text{cond} = c] \approx 1 \quad \forall c, j

Check 3: Condition-Independent Degradation

The key validation: degradation trends should look similar across conditions after normalization.

Quantitative Validation

NormalizationFD002 RMSEFD004 RMSEImprovement
None (raw)45+55+Baseline
Global28-3235-40Moderate
Per-condition18-2220-25Best

Published Results

Many papers reporting on FD002/FD004 use global normalization. Our per-condition approach is one reason AMNL achieves state-of-the-art results.


Summary

In this section, we developed the per-condition normalization strategy:

  1. Normalization importance: Equal feature scales, faster convergence, numerical stability
  2. Global normalization problem: Multi-modal distributions encode condition, not degradation
  3. Per-condition solution: Compute μj(c),σj(c)\mu_j^{(c)}, \sigma_j^{(c)} for each condition
  4. Normalization formula: xnorm=(xμ(c))/σ(c)x^{\text{norm}} = (x - \mu^{(c)}) / \sigma^{(c)}
  5. Validation: Zero mean, unit variance within conditions; aligned degradation trends
AspectGlobalPer-Condition
StatisticsSingle μ, σOne μ, σ per condition
Condition effectPreservedRemoved
Degradation signalObscuredPreserved
FD002/FD004 performancePoorExcellent
Looking Ahead: Normalization is just one way to accidentally corrupt data. A more subtle danger is data leakage—when test information inadvertently influences training. In the next section, we will identify leakage sources and implement safeguards.

With per-condition normalization understood, we are ready to address the critical issue of data leakage prevention.