Chapter 6
12 min read
Section 27 of 104

Why Bidirectional LSTMs?

Bidirectional LSTM Encoder

Learning Objectives

By the end of this section, you will:

  1. Understand the transition from CNN to LSTM in the AMNL architecture
  2. Explain why bidirectional processing improves temporal understanding
  3. Describe the BiLSTM mechanism with forward and backward passes
  4. Justify bidirectionality for RUL prediction despite the causal nature of time
  5. Appreciate the information flow through the BiLSTM encoder
Why This Matters: The LSTM processes the sequence of CNN features to capture long-range temporal dependencies. Using a bidirectional architecture doubles the context available at each timestep, enabling the model to understand how degradation patterns unfold in both directionsβ€”critical for accurate RUL estimation.

From CNN to LSTM

The CNN feature extractor produces a sequence of 64-dimensional feature vectors, one for each of the 30 timesteps. Now we need to model how these features evolve over time.

CNN Output Recap

πŸ“text
1CNN Output: (batch, 30, 64)
2              ↓
3    Sequence of feature vectors:
4    [f₁, fβ‚‚, f₃, ..., f₃₀]
5
6    Each fβ‚œ ∈ ℝ⁢⁴ captures local patterns at timestep t

What CNN Cannot Capture

While the CNN excels at local pattern detection (receptive field of 7 timesteps), it has limitations:

LimitationExampleWhy LSTM Helps
No long-range dependenciesPattern at t=5 relates to t=25LSTM memory spans entire sequence
No temporal orderingWhether degradation is acceleratingLSTM tracks state evolution
Fixed receptive fieldSudden changes after long stabilityLSTM adapts attention dynamically

Division of Labor

The CNN and LSTM have complementary roles:

  • CNN: "What local patterns exist?" β€” Detects spikes, trends, oscillations within 7-timestep windows
  • LSTM: "How do patterns evolve over the full sequence?" β€” Models the trajectory of degradation across all 30 timesteps

Why Bidirectional?

A unidirectional LSTM processes the sequence from left to right, accumulating information forward in time. A bidirectional LSTM adds a second pass from right to left.

Unidirectional Limitation

πŸ“text
1Unidirectional (forward only):
2
3Input:     [x₁] β†’ [xβ‚‚] β†’ [x₃] β†’ [xβ‚„] β†’ [xβ‚…]
4                                         ↓
5Hidden:    [h₁] β†’ [hβ‚‚] β†’ [h₃] β†’ [hβ‚„] β†’ [hβ‚…]
6
7At position t, hidden state h_t only knows about x₁...x_t
8h₃ has NO information about xβ‚„ or xβ‚…!

At each timestep, the unidirectional LSTM only has access to past context. This is limiting because:

  • Understanding the current state often requires future context
  • Is this spike the beginning of failure or a transient anomaly?
  • The answer depends on what happens next

Bidirectional Solution

πŸ“text
1Bidirectional (forward + backward):
2
3Forward:   [x₁] β†’ [xβ‚‚] β†’ [x₃] β†’ [xβ‚„] β†’ [xβ‚…]
4            ↓      ↓      ↓      ↓      ↓
5           [h→₁]  [hβ†’β‚‚]  [h→₃]  [hβ†’β‚„]  [hβ†’β‚…]
6
7Backward:  [x₁] ← [xβ‚‚] ← [x₃] ← [xβ‚„] ← [xβ‚…]
8            ↓      ↓      ↓      ↓      ↓
9           [h←₁]  [h←₂]  [h←₃]  [h←₄]  [h←₅]
10
11Combined:  [h→₁;h←₁] [hβ†’β‚‚;h←₂] [h→₃;h←₃] [hβ†’β‚„;h←₄] [hβ†’β‚…;h←₅]
12
13At position t, combined hidden state knows BOTH past and future!

Information Flow Comparison

AspectUnidirectionalBidirectional
Context at timestep tx₁...xβ‚œ (past only)x₁...xβ‚œ...x_T (full)
Hidden dimensionH2H (concatenated)
ParametersP2P (two LSTMs)
Computation1 pass2 parallel passes
Use caseReal-time streamingOffline analysis

BiLSTM Mechanism

The BiLSTM runs two separate LSTM networks on the same sequence in opposite directions.

Forward LSTM

Processes the sequence from t=1t = 1 to t=Tt = T:

hβ†’t=LSTMβ†’(xt,hβ†’tβˆ’1)\overrightarrow{h}_t = \text{LSTM}_{\rightarrow}(x_t, \overrightarrow{h}_{t-1})

At each timestep, the forward hidden state h→t\overrightarrow{h}_t captures information from x1,x2,...,xtx_1, x_2, ..., x_t.

Backward LSTM

Processes the sequence from t=Tt = T to t=1t = 1:

h←t=LSTM←(xt,h←t+1)\overleftarrow{h}_t = \text{LSTM}_{\leftarrow}(x_t, \overleftarrow{h}_{t+1})

At each timestep, the backward hidden state h←t\overleftarrow{h}_t captures information from xT,xTβˆ’1,...,xtx_T, x_{T-1}, ..., x_t.

Concatenation

The final output at each timestep concatenates both directions:

ht=[hβ†’t;h←t]∈R2Hh_t = [\overrightarrow{h}_t ; \overleftarrow{h}_t] \in \mathbb{R}^{2H}

Where:

  • HH: Hidden dimension of each LSTM (128 in our model)
  • 2H2H: Combined dimension (256)
  • [;][; ]: Concatenation operator

Bidirectionality in RUL Prediction

A natural question arises: if time flows forward, why process the sequence backward?

The Key Insight

In RUL prediction, we are not predicting the future in real-time. We have access to a fixed window of observations (30 cycles) and must estimate the remaining life. Within this window, there is no causal constraintβ€”we can look at all observations.

Analogy: A doctor examining a patient's week-long vital signs doesn't read them strictly left-to-right. They look at the full picture: "The spike on Day 3 is concerning because it wasn't followed by recovery on Days 4-5."

What Bidirectionality Captures

Pattern TypeForward LSTM SeesBackward LSTM Sees
Gradual degradationValues increasing over timeHow high values got
Sudden spikeNormal β†’ spike transitionRecovery (or not) after spike
OscillationIncreasing amplitudeWhere oscillation ends up
Plateau then dropStability before dropDrop is coming

Example: Spike Interpretation

πŸ“text
1Sensor reading: [..., 50, 52, 95, 54, 51, ...]
2                              ↑ spike
3
4Forward LSTM at spike position:
5  "Values jumped from ~50 to 95"
6  Context: Only knows stable past
7
8Backward LSTM at spike position:
9  "Values returned to ~50 after reaching 95"
10  Context: Knows the spike resolved
11
12Combined interpretation:
13  "Transient anomaly, not sustained damage"
14
15Without backward:
16  Could mistake transient spike for onset of failure

Offline vs Online Processing

Training vs Deployment Context

Bidirectionality is valid because our model operates on fixed 30-timestep windows. During training and inference, we have the complete window. For truly real-time streaming applications where future observations don't exist yet, unidirectional would be necessaryβ€”but that's not our use case.


Summary

In this section, we motivated the use of bidirectional LSTMs:

  1. CNN to LSTM transition: CNN captures local patterns; LSTM models temporal evolution
  2. Bidirectional advantage: Each timestep has context from both past and future
  3. BiLSTM mechanism: Forward + backward passes, concatenated outputs
  4. Output dimension: 2H = 256 (128 from each direction)
  5. RUL justification: Fixed windows allow looking both ways
PropertyValue
Input from CNN(B, 30, 64)
Hidden size (H)128
BiLSTM output(B, 30, 256)
ProcessingForward + Backward passes
Context per timestepEntire 30-step window
Looking Ahead: Understanding why we use BiLSTM is the first step. The next section dives into the LSTM cell mathematicsβ€”the gates, cell state, and hidden state updates that give LSTMs their remarkable ability to model long-range dependencies.

With the motivation clear, we now examine the mathematical machinery inside the LSTM cell.