Learning Objectives
By the end of this section, you will:
- Define the three health states and their RUL boundaries
- Design the classification head architecture
- Understand softmax output as probability distribution
- Address class imbalance in health state labels
- Connect classification to cross-entropy loss
Why This Matters: Health classification provides discrete checkpoints along the degradation trajectory. By predicting whether an engine is Healthy, Degrading, or Critical, the model learns to recognize qualitative state transitionsβknowledge that transfers to improved RUL prediction.
Health State Definition
We discretize the continuous RUL into three health states based on remaining life.
State Definitions
| Class | State | RUL Range | Interpretation |
|---|---|---|---|
| 0 | Healthy | RUL > 125 | Normal operation, no immediate concern |
| 1 | Degrading | 50 < RUL β€ 125 | Degradation detected, plan maintenance |
| 2 | Critical | RUL β€ 50 | Imminent failure, urgent maintenance |
Visual Representation
1RUL Timeline:
2 β ββββββββββββββββββββββββββββββββββββββββββ 0
3 β β β β
4 β Healthy β Degrading β Critical β FAILURE
5 β (Class 0)β (Class 1) β (Class 2) β
6 β β β β
7 125 125 50 0
8
9Health State Transition:
10 Healthy β Degrading β Critical β Failure
11 β β β
12 Long Medium Short
13 horizon horizon horizonWhy These Boundaries?
- 125 cycles: Aligns with piecewise linear RUL capβbeyond this, degradation is minimal
- 50 cycles: Approximately one standard deviation of typical failure prediction errorβa reasonable "danger zone"
- Three classes: Sufficient granularity without over-complicating the auxiliary task
Label Generation
Health state labels are derived from RUL labels automatically. No additional annotation is needed:
1def get_health_label(rul: float) -> int:
2 if rul > 125:
3 return 0 # Healthy
4 elif rul > 50:
5 return 1 # Degrading
6 else:
7 return 2 # CriticalClassification Head Architecture
The health classification head transforms the shared representation into class logits.
Architecture Overview
1Input: z β βΒ²β΅βΆ (encoder output)
2 β
3βββββββββββββββββββββββββββββββββββ
4β Linear(256, 64) β
5β 256 Γ 64 + 64 = 16,448 params β
6βββββββββββββββββββββββββββββββββββ
7 β
8βββββββββββββββββββββββββββββββββββ
9β ReLU β
10β 0 parameters β
11βββββββββββββββββββββββββββββββββββ
12 β
13βββββββββββββββββββββββββββββββββββ
14β Dropout(p=0.3) β
15β 0 parameters β
16βββββββββββββββββββββββββββββββββββ
17 β
18βββββββββββββββββββββββββββββββββββ
19β Linear(64, 3) β
20β 64 Γ 3 + 3 = 195 params β
21βββββββββββββββββββββββββββββββββββ
22 β
23Output: logits β βΒ³ (class scores)PyTorch Implementation
Design Rationale
| Choice | Rationale |
|---|---|
| Hidden dim 64 | Simpler task needs smaller head than RUL |
| Output dim 3 | Three health classes |
| No output activation | Softmax in CrossEntropyLoss |
| Smaller than RUL head | Classification is auxiliary, needs less capacity |
Softmax and Probability Output
The head outputs raw logits; softmax converts these to probabilities.
Logits to Probabilities
Where:
- : Logit (raw score) for class i
- : Probability of class i
- : Probabilities sum to 1
Example Computation
Why Not Apply Softmax in the Head?
PyTorch's CrossEntropyLoss expects raw logits, not probabilities:
- Numerical stability: CrossEntropyLoss uses log-sum-exp trick internally
- Efficiency: Avoids computing softmax twice (once in head, once in loss)
- Convention: Standard practice in PyTorch classification
Handling Class Imbalance
The three health classes are not equally represented in the data.
Class Distribution
In a typical C-MAPSS training set, the distribution is approximately:
| Class | State | Approx. Fraction | Challenge |
|---|---|---|---|
| 0 | Healthy | ~45% | Most common (early life) |
| 1 | Degrading | ~30% | Moderate representation |
| 2 | Critical | ~25% | Less common but most important |
Imbalance Mitigation Strategies
We use focal loss (Chapter 11) to address class imbalance:
Where:
- : Class weight for class t
- : Focusing factor (down-weights easy examples)
- : Typical focusing parameter
Focal Loss Intuition
Standard cross-entropy treats all misclassifications equally. Focal loss reduces the loss contribution from easy, well-classified examples (high ), focusing training on hard examples that are misclassified or borderline.
Class Weights
Alternatively, class weights can balance the loss:
1# Inverse frequency weighting
2class_counts = [4500, 3000, 2500] # Example counts
3total = sum(class_counts)
4class_weights = torch.tensor([total / c for c in class_counts])
5class_weights = class_weights / class_weights.sum() * 3 # Normalize
6
7# Use in loss
8loss = F.cross_entropy(logits, labels, weight=class_weights)Critical Class Importance
The Critical class (Class 2) is the most important for maintenance decisionsβmissing a critical state has severe consequences. Our loss design ensures the model does not ignore this minority class despite its lower frequency.
Summary
In this section, we designed the health classification head:
- Three health states: Healthy (RUL > 125), Degrading (50-125), Critical (β€50)
- Architecture: Two-layer MLP (256 β 64 β 3)
- Output: Raw logits (softmax in loss function)
- Class imbalance: Addressed via focal loss or class weights
- Parameters: ~17K
| Property | Value |
|---|---|
| Input dimension | 256 |
| Hidden dimension | 64 |
| Output dimension | 3 (classes) |
| Class 0 (Healthy) | RUL > 125 |
| Class 1 (Degrading) | 50 < RUL β€ 125 |
| Class 2 (Critical) | RUL β€ 50 |
| Total parameters | 16,643 |
Looking Ahead: We have designed both prediction heads. The next section assembles the complete modelβconnecting encoder, heads, and showing the full forward pass from raw sensors to dual predictions.
With both heads designed, we now assemble the complete AMNL model.