Chapter 5
11 min read
Section 22 of 121

Fault Modes (HPC, Fan, and Combinations)

NASA Datasets Deep Dive

Two Failure Modes, Plus Their Combination

A turbofan can wear out in many ways — bearing fatigue, blade erosion, combustor thermal damage, fuel-system contamination — but C-MAPSS simulates only the two dominant gas-path modes: HPC efficiency degradation and FAN efficiency degradation. FD001 and FD002 inject only HPC degradation; FD003 and FD004 mix HPC and FAN. The modelling implication is that FD003/FD004 engines come from a more diverse failure population, which is one of the reasons their RMSE is harder to drive down.

The matrix again. 1 vs 6 operating conditions x 1 vs 2 fault modes = 4 sub-datasets. FD004 is the worst-case combo of both axes - and where the gradient-aware methods win the largest margins.

What Each Failure Looks Like in Sensors

The physics is straightforward: degraded HPC efficiency means a higher exit temperature for the same compression ratio (the compressor wastes energy as heat). Degraded FAN efficiency means a lower bypass mass-flow ratio (the fan moves less air around the core). The downstream sensors mirror those changes.

Failure modeSensor signature (drift direction)Strongest indicator
HPC efficiency dropT30 UP, T50 UP, P30 slightly UP, fuel flow UPT50 (sensor_4) - rises ~25-45 R over engine life
FAN efficiency dropBPR DOWN, W31/W32 DOWN, fuel flow UPBPR (sensor_15) - falls ~0.5
HPC + FAN (FD003/4)Both signatures superimposedT50 + BPR jointly

Notice fuel flow (phi) rises in BOTH modes — degraded engines have to burn more fuel to deliver the same thrust, regardless of which subsystem is degrading. Single-sensor classifiers fail; the model has to read MULTIPLE sensors and disambiguate.

FD Matrix Revisited: Conditions x Faults

1 fault mode (HPC)2 fault modes (HPC + FAN)
1 condition (sea level)FD001 - 'easy' subset, baselineFD003 - same regime, two failure modes
6 conditions (envelope)FD002 - multi-condition challengeFD004 - hardest: 6 conds x 2 faults

The paper's biggest gradient-aware-training wins (Section 26) come on FD002 and FD004 specifically because the multi-condition regime amplifies the gradient imbalance and gives GABA / GRACE more room to help. On single-condition single-fault FD001 the dual-task framework still helps but by a smaller margin.

Python: Distinguish Fault Modes by Signature

On FD003 the engine population is a mixture of HPC-failure and FAN-failure engines. We can separate them by looking at the drift of two diagnostic sensors over each engine's life.

Drift on T50 and W32 separates HPC-fail from FAN-fail engines
🐍fault_signatures.py
1import numpy as np

Standard alias.

2import pandas as pd

DataFrame loader.

5COLUMNS = ...

26-column layout.

10df = pd.read_csv(...)

Load FD003 - the 'single condition, two fault modes' subset.

11df['RUL'] = ...

Per-engine RUL.

16def end_of_life_drift(df, sensor):

Helper that returns the (start - end) drift for one sensor on every engine.

EXECUTION STATE
input: sensor = Column name e.g. 'sensor_4' or 'sensor_21'
returns = List of (engine_id, drift_value) pairs
17drifts = []

Per-engine accumulator.

18for eid, sub in df.groupby('engine_id'):

Iterate engines.

19first = sub.iloc[0][sensor]

Sensor value at cycle 1 of this engine.

20last = sub.iloc[-1][sensor]

Sensor value at the engine's last cycle (RUL = 0).

21drifts.append((eid, last - first))

Signed drift over the engine's life. Positive = sensor grew; negative = shrank.

22return drifts

Return the per-engine list.

25hpc_drifts = end_of_life_drift(df, 'sensor_4')

T50 (LPT outlet temperature) drift per engine. Rising T50 across an engine's life is a hallmark of HPC degradation.

26fan_drifts = end_of_life_drift(df, 'sensor_21')

W32 (LPT coolant bleed) drift per engine. Falling W32 is a hallmark of FAN degradation.

29print("sensor_4 drift summary (T50 - HPC indicator):")

Header for the HPC-indicator stats.

30print(f" median: {np.median([d for _, d in hpc_drifts]):+.2f}")

Median drift across all engines. Almost all engines drift UP because most engines on FD003 develop SOME HPC degradation, even those that primarily fail in the fan.

EXECUTION STATE
Output = median: +24.40
31print(f" range : [{min(...):+.2f}, {max(...):+.2f}]")

Min and max of the per-engine drifts. Wide range because some engines develop heavy HPC failure.

EXECUTION STATE
Output = range : [+12.10, +44.20]
33print("sensor_21 drift summary (W32 - FAN indicator):")

Header for FAN.

34print(f" median: {np.median([d for _, d in fan_drifts]):+.2f}")

Median W32 drift. Slightly negative because FAN failures pull W32 down on some engines.

EXECUTION STATE
Output = median: -0.85
35print(f" range : [{min(...):+.2f}, {max(...):+.2f}]")

Range of W32 drifts. Negative end indicates the FAN-failure engines.

EXECUTION STATE
Output = range : [-2.10, +0.20]
→ interpretation = Engines with the most negative W32 drift are the FAN-failure cohort. Combined with high T50, you get HPC+FAN combined failure.
26 lines without explanation
1import numpy as np
2import pandas as pd
3
4# ----- Load FD003 (1 condition, 2 fault modes) -----
5COLUMNS = (
6    ["engine_id", "cycle"]
7    + [f"op_set_{i}" for i in range(1, 4)]
8    + [f"sensor_{i}" for i in range(1, 22)]
9)
10df = pd.read_csv("data/raw/train_FD003.txt", sep=r"\s+", header=None, names=COLUMNS)
11df["RUL"] = df.groupby("engine_id")["cycle"].transform("max") - df["cycle"]
12
13
14# ----- For each engine, compute the END-OF-LIFE drift on key sensors -----
15# Sensor 4 (T50) tends to drift up under HPC failure
16# Sensor 21 (W32) tends to drift down under FAN failure
17def end_of_life_drift(df, sensor: str):
18    drifts = []
19    for eid, sub in df.groupby("engine_id"):
20        first = sub.iloc[0][sensor]
21        last  = sub.iloc[-1][sensor]
22        drifts.append((eid, last - first))
23    return drifts
24
25
26hpc_drifts = end_of_life_drift(df, "sensor_4")    # T50
27fan_drifts = end_of_life_drift(df, "sensor_21")   # W32
28
29# Engine population is mixed - some HPC-fail, some FAN-fail
30print("sensor_4  drift summary (T50 - HPC indicator):")
31print(f"  median: {np.median([d for _, d in hpc_drifts]):+.2f}")
32print(f"  range : [{min(d for _, d in hpc_drifts):+.2f}, {max(d for _, d in hpc_drifts):+.2f}]")
33print()
34print("sensor_21 drift summary (W32 - FAN indicator):")
35print(f"  median: {np.median([d for _, d in fan_drifts]):+.2f}")
36print(f"  range : [{min(d for _, d in fan_drifts):+.2f}, {max(d for _, d in fan_drifts):+.2f}]")
37
38# Engines that drift UP heavily on T50 are likely HPC failures.
39# Engines that drift DOWN heavily on W32 are likely FAN failures.
40
41# sensor_4  drift summary (T50 - HPC indicator):
42#   median: +24.40
43#   range : [+12.10, +44.20]
44# sensor_21 drift summary (W32 - FAN indicator):
45#   median: -0.85
46#   range : [-2.10, +0.20]
FD003 contains a mix - some engines fail predominantly in HPC, some predominantly in FAN. The model in Chapter 11 does NOT explicitly classify fault mode (the auxiliary head only sees a 3-class HEALTH STATE). Whether to add a fourth fault-mode head is an empirical question explored in Chapter 27.

PyTorch: Fault Mode as an Auxiliary Label?

Since you have already seen the dual-task model from Section 4, adding a third head is one extra nn.Linear. The question is whether the extra signal helps or hurts.

A speculative third head for fault-mode classification
🐍triple_task_mlp.py
1import torch

Top-level PyTorch.

2import torch.nn as nn

Layers.

3import torch.nn.functional as F

(Loss functions when needed.)

14class TripleTaskMLP(nn.Module):

Speculative extension - shared backbone with THREE heads instead of two. The third head predicts the fault mode. Could either help (more auxiliary signal) or hurt (more weight imbalance). The paper's ablation explores this in Section 27.

EXECUTION STATE
design = regression + health-state classification + fault-mode classification
17def __init__(self, in_dim=14, hidden=32):

Constructor.

18super().__init__()

Initialise nn.Module.

19self.shared = ...

Shared backbone identical to §4.1.

20self.head_rul = nn.Linear(hidden, 1)

Regression head.

21self.head_health = nn.Linear(hidden, 3)

3-class health head.

22self.head_fault = nn.Linear(hidden, 3)

NEW 3-class fault-mode head. Would receive its own loss term in the training loop.

EXECUTION STATE
extra parameters = 32 * 3 + 3 = 99 - tiny addition
24def forward(self, x):

Forward returns a 3-tuple instead of 2.

25h = self.shared(x)

Shared trunk forward.

26return (..., self.head_health(h), self.head_fault(h))

Three task outputs from a single forward pass.

32m = TripleTaskMLP()

Instantiate.

33x = torch.randn(2, 14)

Fake batch with the 14-channel filtered input.

34rul, health, fault = m(x)

Three outputs returned from one forward.

35print("rul.shape :", tuple(rul.shape))

Verify regression shape.

EXECUTION STATE
Output = rul.shape : (2,)
36print("health.shape :", tuple(health.shape))

3-class logits.

EXECUTION STATE
Output = health.shape : (2, 3)
37print("fault.shape :", tuple(fault.shape))

3-class fault-mode logits. Same shape as health head.

EXECUTION STATE
Output = fault.shape : (2, 3)
19 lines without explanation
1import torch
2import torch.nn as nn
3import torch.nn.functional as F
4
5# ----- Optional: a 3-class fault-mode head -----
6# Class 0: HPC-only failure       (FD001, some FD003 engines)
7# Class 1: FAN-only failure       (some FD003 engines)
8# Class 2: HPC + FAN failure      (most FD004 engines)
9#
10# Adding this head would give the model an extra auxiliary signal alongside
11# the health state classification we already use. Whether to add it is an
12# empirical question (Section 11 ablation).
13
14class TripleTaskMLP(nn.Module):
15    """Hypothetical extension: regression + 3-class health + 3-class fault mode."""
16
17    def __init__(self, in_dim=14, hidden=32):
18        super().__init__()
19        self.shared       = nn.Sequential(nn.Linear(in_dim, hidden), nn.ReLU())
20        self.head_rul     = nn.Linear(hidden, 1)
21        self.head_health  = nn.Linear(hidden, 3)
22        self.head_fault   = nn.Linear(hidden, 3)    # NEW: fault-mode head
23
24    def forward(self, x):
25        h = self.shared(x)
26        return (
27            self.head_rul(h).squeeze(-1),
28            self.head_health(h),
29            self.head_fault(h),
30        )
31
32
33m = TripleTaskMLP()
34x = torch.randn(2, 14)
35rul, health, fault = m(x)
36print("rul.shape    :", tuple(rul.shape))     # (2,)
37print("health.shape :", tuple(health.shape))  # (2, 3)
38print("fault.shape  :", tuple(fault.shape))   # (2, 3)
Empirical answer (preview). The paper found that adding a fault-mode head does not significantly improve RMSE or NASA on FD003/FD004 once the dual-task model is properly trained with GABA. Section 27's ablation argues that the health- state head already carries most of the fault-mode information implicitly.

Failure-Mode Modelling Beyond Turbofans

EquipmentCommon failure modesDiscriminating signal
Turbofan (this book)HPC efficiency, FAN efficiencyT50 vs BPR drift
Lithium-ion batterySEI growth, lithium plating, electrolyte lossCharge curve shape, EIS
Rolling-element bearingInner-race, outer-race, ball-spin defectsVibration spectrum harmonics
Wind-turbine gearboxBearing fatigue, gear-tooth wearTorque ripple, oil debris
Power transformerInsulation aging, partial dischargeDissolved-gas-in-oil ratios
Hard-disk driveHead crash, motor failure, electronics failureSMART attribute combinations

Three Fault-Mode Pitfalls

Pitfall 1: Assuming a single mode. A model trained on FD001 (HPC-only) does not transfer cleanly to FD004 (HPC + FAN). Always train on the dataset that matches your deployment's failure population.
Pitfall 2: Single-sensor diagnostics. Both fault modes raise fuel flow. A single-sensor classifier on phi cannot tell HPC from FAN failure - you need a feature combination.
Pitfall 3: Adding tasks for free. A fault-mode head adds another loss term to balance. With GABA the cost is manageable, but careless static weighting plus a third loss can push the model off the Pareto frontier entirely.
The point. Two underlying physical failure modes; four sub-datasets that combine them with operating regimes. The framework in this book treats the failure population as a single regression-plus-health-classification problem - the dual-task approach implicitly absorbs the fault-mode information.

Takeaway

  • C-MAPSS simulates two fault modes. HPC efficiency drop and FAN efficiency drop. FD003/FD004 mix them.
  • Sensor signatures differ. HPC failures push T50 and T30 UP; FAN failures pull BPR and bleed flows DOWN.
  • The dual-task health head implicitly captures fault mode. Adding a third head is possible but the paper finds it does not help once GABA is doing its job.
  • FD004 is the hardest subset. 6 conditions x 2 fault modes = the regime where gradient-aware methods win biggest.
Loading comments...