Learning Objectives
By the end of this section, you will:
- Understand the health classification task as an auxiliary objective
- Derive the cross-entropy loss for 3-class classification
- Explain the regularization role of health classification
- Handle class imbalance appropriately
- Implement the health loss in PyTorch
Why This Matters: The health classification task is not merely an additional output—it is the regularizer that enables AMNL to achieve state-of-the-art RUL prediction. By forcing the encoder to learn discrete degradation stages, it prevents overfitting to noisy RUL labels.
The Health Classification Task
Health classification discretizes the degradation trajectory into three states.
Class Definitions
| Class | Name | RUL Range | Interpretation |
|---|---|---|---|
| 0 | Healthy | RUL > 125 | Normal operation |
| 1 | Degrading | 50 < RUL ≤ 125 | Degradation detected |
| 2 | Critical | RUL ≤ 50 | Imminent failure |
Label Generation
Health labels are derived automatically from RUL:
Why This Discretization?
- Boundary at 125: Aligns with piecewise linear RUL cap
- Boundary at 50: Defines "critical" zone requiring action
- Three classes: Sufficient granularity without over-complication
Cross-Entropy Loss
We use standard cross-entropy for health classification.
Mathematical Formulation
Where:
- : True health class for sample i
- : Predicted probability for the correct class
- : Batch size
Expanded Form
With softmax probabilities:
The loss becomes:
Regularization Role
Health classification acts as a powerful regularizer for RUL prediction.
How Regularization Works
1Without health task:
2 Encoder ─→ RUL Head ─→ RUL Prediction
3 └── May overfit to exact cycle numbers
4
5With health task:
6 Encoder ─┬→ RUL Head ─→ RUL Prediction
7 └→ Health Head ─→ Health Classification
8 └── Must learn features useful for BOTH tasksWhy This Helps RUL
- Coarse supervision: Health classes provide "checkpoints" along degradation
- Noise reduction: Discrete classes are less noisy than exact RUL
- Feature regularization: Encoder must learn generalizable features
- Gradient diversity: Different loss gradients stabilize training
Empirical Evidence
Removing health classification dramatically hurts RUL performance:
| Configuration | FD002 Score | Degradation |
|---|---|---|
| AMNL (dual-task) | 1,102 | — |
| RUL only (single-task) | 4,453 | +304% |
Critical Finding
Without the health classification auxiliary task, RUL prediction performance degrades by over 300%. The health task is not optional—it is essential for achieving state-of-the-art results.
Implementation
The health loss uses PyTorch's built-in cross-entropy.
Basic Implementation
With Class Weights (Optional)
For handling class imbalance:
Typical Loss Values
| Training Stage | Typical Value | Interpretation |
|---|---|---|
| Early (epoch 1-10) | 1.0-1.5 | Near random guessing |
| Mid (epoch 10-50) | 0.4-0.8 | Learning class boundaries |
| Late (epoch 50+) | 0.2-0.4 | Good classification |
| Converged | 0.1-0.3 | Reliable predictions |
Loss Monitoring
Health loss should decrease smoothly during training. Sudden increases may indicate learning rate issues or label noise.
Summary
In this section, we examined the health classification loss:
- Task: 3-class classification (Healthy, Degrading, Critical)
- Loss: Standard cross-entropy
- Role: Regularizer for RUL prediction
- Impact: Removing it degrades RUL by 304%
- Implementation: F.cross_entropy in PyTorch
| Property | Value |
|---|---|
| Number of classes | 3 |
| Class boundaries | 125, 50 cycles |
| Typical converged loss | 0.2-0.4 |
| Performance impact | Essential for SOTA |
Looking Ahead: We have defined both loss components. The next section explains why the 0.5/0.5 split provides superior regularization—the theoretical justification for AMNL's equal weighting strategy.
With both loss components defined, we analyze why equal weighting is optimal.