Learning Objectives
By the end of this section, you will:
- Understand the transition from CNN to LSTM in the AMNL architecture
- Explain why bidirectional processing improves temporal understanding
- Describe the BiLSTM mechanism with forward and backward passes
- Justify bidirectionality for RUL prediction despite the causal nature of time
- Appreciate the information flow through the BiLSTM encoder
Why This Matters: The LSTM processes the sequence of CNN features to capture long-range temporal dependencies. Using a bidirectional architecture doubles the context available at each timestep, enabling the model to understand how degradation patterns unfold in both directionsβcritical for accurate RUL estimation.
From CNN to LSTM
The CNN feature extractor produces a sequence of 64-dimensional feature vectors, one for each of the 30 timesteps. Now we need to model how these features evolve over time.
CNN Output Recap
1CNN Output: (batch, 30, 64)
2 β
3 Sequence of feature vectors:
4 [fβ, fβ, fβ, ..., fββ]
5
6 Each fβ β ββΆβ΄ captures local patterns at timestep tWhat CNN Cannot Capture
While the CNN excels at local pattern detection (receptive field of 7 timesteps), it has limitations:
| Limitation | Example | Why LSTM Helps |
|---|---|---|
| No long-range dependencies | Pattern at t=5 relates to t=25 | LSTM memory spans entire sequence |
| No temporal ordering | Whether degradation is accelerating | LSTM tracks state evolution |
| Fixed receptive field | Sudden changes after long stability | LSTM adapts attention dynamically |
Division of Labor
The CNN and LSTM have complementary roles:
- CNN: "What local patterns exist?" β Detects spikes, trends, oscillations within 7-timestep windows
- LSTM: "How do patterns evolve over the full sequence?" β Models the trajectory of degradation across all 30 timesteps
Why Bidirectional?
A unidirectional LSTM processes the sequence from left to right, accumulating information forward in time. A bidirectional LSTM adds a second pass from right to left.
Unidirectional Limitation
1Unidirectional (forward only):
2
3Input: [xβ] β [xβ] β [xβ] β [xβ] β [xβ
]
4 β
5Hidden: [hβ] β [hβ] β [hβ] β [hβ] β [hβ
]
6
7At position t, hidden state h_t only knows about xβ...x_t
8hβ has NO information about xβ or xβ
!At each timestep, the unidirectional LSTM only has access to past context. This is limiting because:
- Understanding the current state often requires future context
- Is this spike the beginning of failure or a transient anomaly?
- The answer depends on what happens next
Bidirectional Solution
1Bidirectional (forward + backward):
2
3Forward: [xβ] β [xβ] β [xβ] β [xβ] β [xβ
]
4 β β β β β
5 [hββ] [hββ] [hββ] [hββ] [hββ
]
6
7Backward: [xβ] β [xβ] β [xβ] β [xβ] β [xβ
]
8 β β β β β
9 [hββ] [hββ] [hββ] [hββ] [hββ
]
10
11Combined: [hββ;hββ] [hββ;hββ] [hββ;hββ] [hββ;hββ] [hββ
;hββ
]
12
13At position t, combined hidden state knows BOTH past and future!Information Flow Comparison
| Aspect | Unidirectional | Bidirectional |
|---|---|---|
| Context at timestep t | xβ...xβ (past only) | xβ...xβ...x_T (full) |
| Hidden dimension | H | 2H (concatenated) |
| Parameters | P | 2P (two LSTMs) |
| Computation | 1 pass | 2 parallel passes |
| Use case | Real-time streaming | Offline analysis |
BiLSTM Mechanism
The BiLSTM runs two separate LSTM networks on the same sequence in opposite directions.
Forward LSTM
Processes the sequence from to :
At each timestep, the forward hidden state captures information from .
Backward LSTM
Processes the sequence from to :
At each timestep, the backward hidden state captures information from .
Concatenation
The final output at each timestep concatenates both directions:
Where:
- : Hidden dimension of each LSTM (128 in our model)
- : Combined dimension (256)
- : Concatenation operator
Bidirectionality in RUL Prediction
A natural question arises: if time flows forward, why process the sequence backward?
The Key Insight
In RUL prediction, we are not predicting the future in real-time. We have access to a fixed window of observations (30 cycles) and must estimate the remaining life. Within this window, there is no causal constraintβwe can look at all observations.
Analogy: A doctor examining a patient's week-long vital signs doesn't read them strictly left-to-right. They look at the full picture: "The spike on Day 3 is concerning because it wasn't followed by recovery on Days 4-5."
What Bidirectionality Captures
| Pattern Type | Forward LSTM Sees | Backward LSTM Sees |
|---|---|---|
| Gradual degradation | Values increasing over time | How high values got |
| Sudden spike | Normal β spike transition | Recovery (or not) after spike |
| Oscillation | Increasing amplitude | Where oscillation ends up |
| Plateau then drop | Stability before drop | Drop is coming |
Example: Spike Interpretation
1Sensor reading: [..., 50, 52, 95, 54, 51, ...]
2 β spike
3
4Forward LSTM at spike position:
5 "Values jumped from ~50 to 95"
6 Context: Only knows stable past
7
8Backward LSTM at spike position:
9 "Values returned to ~50 after reaching 95"
10 Context: Knows the spike resolved
11
12Combined interpretation:
13 "Transient anomaly, not sustained damage"
14
15Without backward:
16 Could mistake transient spike for onset of failureOffline vs Online Processing
Training vs Deployment Context
Bidirectionality is valid because our model operates on fixed 30-timestep windows. During training and inference, we have the complete window. For truly real-time streaming applications where future observations don't exist yet, unidirectional would be necessaryβbut that's not our use case.
Summary
In this section, we motivated the use of bidirectional LSTMs:
- CNN to LSTM transition: CNN captures local patterns; LSTM models temporal evolution
- Bidirectional advantage: Each timestep has context from both past and future
- BiLSTM mechanism: Forward + backward passes, concatenated outputs
- Output dimension: 2H = 256 (128 from each direction)
- RUL justification: Fixed windows allow looking both ways
| Property | Value |
|---|---|
| Input from CNN | (B, 30, 64) |
| Hidden size (H) | 128 |
| BiLSTM output | (B, 30, 256) |
| Processing | Forward + Backward passes |
| Context per timestep | Entire 30-step window |
Looking Ahead: Understanding why we use BiLSTM is the first step. The next section dives into the LSTM cell mathematicsβthe gates, cell state, and hidden state updates that give LSTMs their remarkable ability to model long-range dependencies.
With the motivation clear, we now examine the mathematical machinery inside the LSTM cell.