Learning Objectives
By the end of this section, you will:
- Define time series mathematically and understand their role in sequential data analysis
- Master the notation for univariate and multivariate time series used throughout this book
- Understand stationarity and why sensor degradation data is inherently non-stationary
- Know autocorrelation and why it creates temporal dependencies that deep learning can exploit
- Formalize the sliding window approach for converting variable-length sequences to fixed-length inputs
- Connect these concepts to real sensor data from turbofan engines
Why This Matters: Every deep learning architecture for time series—from LSTMs to Transformers—is designed to capture temporal structure. Understanding what time series are, why they exhibit autocorrelation, and how we represent them mathematically provides the foundation for understanding why certain architectures work better than others.
What is a Time Series?
A time series is a sequence of observations recorded at successive points in time. Unlike independent data points, time series exhibit temporal dependencies—what happens at time depends on what happened at times
Historical Context
The mathematical study of time series began in earnest in the 1920s with Yule's work on sunspot cycles and was formalized by Norbert Wiener and Andrey Kolmogorov in the 1940s. The key insight was that random processes could have predictable structure when viewed sequentially.
Time Series in RUL Prediction
In predictive maintenance, our time series are sensor measurements recorded at each operational cycle of equipment. As the equipment degrades, these measurements change in systematic ways that can be learned by neural networks.
| Time Series Type | Example | Characteristic |
|---|---|---|
| Stock prices | Daily closing prices | Volatile, trends, cycles |
| Weather data | Hourly temperature | Seasonal patterns |
| ECG signals | Heart electrical activity | Periodic with anomalies |
| Sensor data (ours) | Turbofan engine readings | Degradation trends, multi-variate |
Mathematical Notation
We establish precise notation that will be used consistently throughout this book.
Univariate Time Series
A univariate time series is a sequence of scalar observations:
Where:
- is the observation at time
- is the total length of the series
- is the time index (discrete time)
Stochastic Process View
From a probabilistic perspective, each observation is a realization of a random variable :
The observed sequence is one realization of this process. In our context, each engine provides one realization—different engines under identical conditions would produce different (but statistically similar) sequences.
Why This Matters for Deep Learning
Multivariate Time Series
In predictive maintenance, we observe multiple sensors simultaneously. This gives us a multivariate time series.
Definition
A multivariate time series of dimension is:
Where:
- is the feature vector at time
- is the number of features (sensors + settings)
- is the value of feature at time
Matrix Representation
The entire multivariate sequence can be represented as a matrix:
Each row is a timestep; each column is a feature (sensor).
Stationarity and Non-Stationarity
Stationarity is a fundamental concept that describes whether a time series' statistical properties change over time.
Strict Stationarity
A process is strictly stationary if its joint distribution is invariant to time shifts:
for all time indices and all shifts .
Weak (Second-Order) Stationarity
More practically, a process is weakly stationary if:
- Constant mean: for all
- Constant variance: for all
- Covariance depends only on lag: (depends only on , not )
Degradation Data is Non-Stationary
Equipment degradation data violates stationarity by design. As the equipment wears:
- Mean changes: Sensor readings drift systematically (e.g., temperature increases)
- Variance may change: Readings become more erratic as failure approaches
- Distribution shifts: The generating process changes fundamentally over time
Implications for Modeling
Non-stationarity means we cannot use simple stationary models (like AR, MA, ARIMA) directly. Instead, we need models that can track changing dynamics—which is exactly what LSTMs and attention mechanisms do.
| Property | Stationary Series | Degradation Data |
|---|---|---|
| Mean | Constant over time | Drifts with degradation |
| Variance | Constant over time | May increase near failure |
| Distribution | Same at all times | Changes from healthy to critical |
| Traditional models | AR, MA, ARIMA work well | Require non-linear, adaptive models |
Autocorrelation and Temporal Structure
Autocorrelation measures how correlated a time series is with itself at different time lags. This is the key property that makes sequential modeling necessary.
Autocovariance Function
For a stationary process, the autocovariance at lag is:
Autocorrelation Function (ACF)
The normalized version is the autocorrelation function:
Note that and .
Why Autocorrelation Matters
- High autocorrelation means nearby values are similar → sequence models can exploit this
- Slow decay means long-range dependencies → LSTMs needed for long memory
- Periodic peaks indicate seasonal patterns → attention can learn to focus on relevant periods
Sliding Window Representation
Real equipment trajectories have variable lengths—some engines run for 128 cycles, others for 362. Neural networks need fixed-size inputs. The sliding window approach bridges this gap.
Formal Definition
Given a multivariate time series and window size , we construct windowed samples:
for .
Labels for Each Window
Each window gets the label corresponding to its final timestep:
This reflects the prediction task: given the last cycles, predict the RUL now.
Number of Windows
From a single trajectory of length :
Why Window Size 30?
The choice of balances several factors:
- Sufficient context: 30 cycles capture meaningful degradation trends
- Not too long: Avoids including irrelevant ancient history
- Computational efficiency: Keeps sequence length manageable for LSTMs
- Empirical performance: Validated through ablation studies
Sensor Data as Time Series
Let's connect these abstract concepts to the concrete sensor data from NASA C-MAPSS.
The 17-Dimensional Feature Vector
At each cycle , we observe:
Where the 17 components are:
| Index | Feature | Physical Meaning |
|---|---|---|
| 1-3 | setting₁, setting₂, setting₃ | Altitude, Mach, Throttle (operating condition) |
| 4-7 | sensor₂, sensor₃, sensor₄, sensor₆ | Temperatures at various engine stages |
| 8-10 | sensor₇, sensor₈, sensor₉ | Speeds and pressure ratios |
| 11-14 | sensor₁₁, sensor₁₂, sensor₁₃, sensor₁₄ | Corrected speeds, bleed measurements |
| 15-17 | sensor₁₇, sensor₂₀, (reserved) | Additional pressure and temperature |
Cross-Sensor Correlations
Beyond temporal autocorrelation, sensors exhibit cross-correlations:
For example, HPC outlet temperature correlates with HPT coolant bleed because they share physical dependencies. CNNs in our architecture learn to exploit these cross-correlations.
Degradation Signatures
Different sensors show degradation in different ways:
| Sensor Type | Healthy Behavior | Degradation Signal |
|---|---|---|
| Temperature | Stable around setpoint | Gradual increase (less efficient cooling) |
| Speed | Stable at rated value | Small oscillations, slight drift |
| Pressure | Consistent ratio | Decreased efficiency → changed ratios |
| Vibration | Low amplitude | Increasing amplitude (mechanical wear) |
The Deep Learning Opportunity: Each sensor tells part of the story. By processing all sensors together through CNN, BiLSTM, and Attention layers, our model learns to fuse these partial signals into a holistic degradation assessment that no single sensor could provide.
Summary
In this section, we established the mathematical foundations for time series analysis:
- Time series are sequences where observations depend on their temporal position
- Multivariate time series record features at each timestep
- Stationarity means stable statistical properties—but degradation data is inherently non-stationary
- Autocorrelation measures temporal dependencies that sequence models exploit
- Sliding windows convert variable-length trajectories to fixed-size inputs
- Sensor data exhibits both temporal autocorrelation and cross-sensor correlations
| Concept | Notation | In C-MAPSS |
|---|---|---|
| Feature dimension | D | 17 (settings + sensors) |
| Sequence length | T | 128-362 cycles per engine |
| Window size | W | 30 cycles |
| Window shape | ℝ^{W×D} | ℝ^{30×17} |
| Windows per engine | T - W + 1 | ~100-330 |
Looking Ahead: In the next section, we will explore convolution operations for sequences—the mathematical foundation for the CNN feature extractor in our architecture. You will learn how sliding kernels extract local patterns from time series.
With time series fundamentals established, we can now build up the mathematical machinery for each component of our deep learning model.