Learning Objectives
By the end of this section, you will:
- Understand the motivation for discretizing RUL into health states
- Define the 5 health state categories and their RUL boundaries
- Apply the discretization formula to convert RUL to class labels
- Analyze class distribution and address imbalance concerns
- Connect discretization to multi-task learning as described in Chapter 2
Why This Matters: Our AMNL model performs both RUL regression and health state classification. The classification task provides categorical structure that regularizes the regression, leading to better overall performance. This section defines how we create the classification labels.
Why Discretize RUL?
We already have continuous RUL labels. Why create a parallel discrete representation?
Motivation 1: Practical Decision Making
In real maintenance operations, decisions are categorical, not continuous:
- Healthy: Continue normal operation
- Degrading: Schedule inspection at next opportunity
- Warning: Plan maintenance within days
- Critical: Ground aircraft, inspect immediately
A health state classification directly supports these decisions without requiring engineers to interpret continuous RUL values.
Motivation 2: Regularization
Classification provides categorical structure that constrains the learned representation:
- The model learns features that distinguish broad degradation stages
- Classification loss provides gradients even when regression is noisy
- Shared features benefit from both signal types
Motivation 3: Robustness
Exact RUL prediction is inherently uncertain. Health states provide a more robust target:
| Scenario | RUL Prediction | Health State |
|---|---|---|
| True RUL = 50, Pred = 60 | Error = 10 cycles | Same state (correct) |
| True RUL = 50, Pred = 45 | Error = 5 cycles | Same state (correct) |
| True RUL = 50, Pred = 20 | Error = 30 cycles | Wrong state (error) |
Small RUL errors within a state are acceptable; large errors that cross state boundaries are not.
Defining Health States
We define 5 health states based on RUL ranges. This follows the approach used in several prior works on C-MAPSS.
The 5 Health States
| State | Label | RUL Range | Description | Action |
|---|---|---|---|---|
| 0 | Healthy | RUL > 100 | Normal operation, no degradation | Continue operation |
| 1 | Minor Degradation | 75 < RUL ≤ 100 | Early signs of wear | Monitor closely |
| 2 | Moderate Degradation | 50 < RUL ≤ 75 | Clear degradation trends | Schedule inspection |
| 3 | Significant Degradation | 25 < RUL ≤ 50 | Advanced degradation | Plan maintenance |
| 4 | Critical | RUL ≤ 25 | Failure imminent | Immediate action |
Boundary Visualization
1RUL
2 ^
3 |
4125| ───────────────────── State 0: Healthy
5 |
6100| · · · · · · · · · · · Boundary
7 | State 1: Minor Degradation
8 75| · · · · · · · · · · · Boundary
9 | State 2: Moderate Degradation
10 50| · · · · · · · · · · · Boundary
11 | State 3: Significant Degradation
12 25| · · · · · · · · · · · Boundary
13 | State 4: Critical
14 0| ───────────────────── Failure
15 +------------------------> CycleWhy These Boundaries?
The boundaries (100, 75, 50, 25) create equal-width intervals of 25 cycles each (except State 0 which extends to 125+):
- Equal intervals: Balanced class sizes within the degrading range
- Meaningful thresholds: 25-cycle intervals align with typical maintenance planning windows
- Prior work: These boundaries are established in C-MAPSS literature, enabling comparison
Discretization Formula
Given a (piecewise linear) RUL value, we compute the health state:
This can be computed efficiently using floor division:
Implementation
1def rul_to_health_state(rul, num_states=5, rul_max=125):
2 """
3 Convert RUL to health state label.
4
5 Args:
6 rul: Remaining Useful Life (can be array)
7 num_states: Number of discrete states (default 5)
8 rul_max: Maximum RUL value (default 125)
9
10 Returns:
11 Health state label(s) in range [0, num_states-1]
12 """
13 interval = rul_max // num_states # 25 for 5 states
14 state = (rul_max - rul) // interval
15 return min(state, num_states - 1)
16
17# Example usage:
18# rul_to_health_state(85) -> 1
19# rul_to_health_state(30) -> 3
20# rul_to_health_state(5) -> 4Class Distribution Analysis
Understanding the class distribution helps anticipate training challenges.
Theoretical Distribution
For an engine with lifetime cycles:
| State | RUL Range | Cycles in State |
|---|---|---|
| 0 | > 100 | T - 125 + 25 = T - 100 |
| 1 | 75-100 | 25 |
| 2 | 50-75 | 25 |
| 3 | 25-50 | 25 |
| 4 | 0-25 | 25 |
States 1-4 each contain exactly 25 cycles, but State 0 contains all remaining cycles. This creates inherent class imbalance.
FD001 Class Distribution
Addressing Class Imbalance
Several strategies can address the imbalance:
| Strategy | Mechanism | Trade-off |
|---|---|---|
| Class weights | Weight loss by inverse frequency | May overfit to rare classes |
| Oversampling | Duplicate minority samples | Increases training time |
| Undersampling | Remove majority samples | Loses information |
| Focal loss | Down-weight easy examples | Complex tuning |
| Accept imbalance | Let model learn natural distribution | May underperform on rare classes |
In our AMNL model, we use the AMNL loss normalization which automatically balances the classification task contribution regardless of class imbalance.
Connection to Multi-Task Learning
The health state labels enable the multi-task learning framework we introduced in Chapter 2.
Two-Head Architecture
Our model produces two outputs from the shared representation:
Where is the 5-dimensional probability simplex (softmax outputs).
Loss Components
| Task | Target | Loss | Purpose |
|---|---|---|---|
| RUL Regression | Piecewise RUL | MSE | Precise cycle prediction |
| Health Classification | Health state (0-4) | Cross-Entropy | Categorical structure |
Synergy Between Tasks
The tasks reinforce each other:
- Classification → Regression: Categorical boundaries prevent regression from making errors that cross state boundaries
- Regression → Classification: Fine-grained RUL signal helps classification near boundaries
- Shared features: Both tasks train the shared backbone, leading to richer representations
The Key Insight: A model that predicts RUL = 48 vs RUL = 52 makes a small regression error (4 cycles), but crosses the State 2/3 boundary—a classification error. Multi-task learning penalizes both, encouraging predictions that respect categorical structure while maintaining regression precision.
Label Consistency
Always derive health states from the piecewise RUL, not raw RUL. This ensures State 0 corresponds to RUL = 100-125, not RUL = 100-∞. Consistency between regression and classification targets is essential.
Summary
In this section, we defined the health state discretization scheme:
- Motivation: Practical decision making, regularization, robustness
- 5 health states: Healthy (0), Minor (1), Moderate (2), Significant (3), Critical (4)
- Boundaries: 100, 75, 50, 25 cycles create equal intervals
- Formula:
- Class imbalance: State 0 dominates (~50%), addressed through AMNL loss
- Multi-task connection: Classification provides categorical structure for regression
| State | RUL Range | Meaning | Typical Action |
|---|---|---|---|
| 0 | > 100 | Healthy | Normal operation |
| 1 | 75-100 | Minor degradation | Monitor |
| 2 | 50-75 | Moderate degradation | Schedule inspection |
| 3 | 25-50 | Significant degradation | Plan maintenance |
| 4 | ≤ 25 | Critical | Immediate action |
Chapter Summary: We have now deeply explored the NASA C-MAPSS dataset—its structure, operating conditions, fault modes, sensor selections, and target formulations. With this understanding, we are ready to build the data preprocessing pipeline in Chapter 4: normalization, windowing, and creating PyTorch datasets that our model can consume.
With the dataset fully understood, we are ready to implement the data pipeline that transforms raw files into model-ready tensors.