Designing a Schedule From Four Constraints
§14.1 said “weight near-failure samples more.” That is necessary but not sufficient - many curves do that. Pick the wrong one and training oscillates, or worse, the weight schedule introduces its own bias on top of the data bias it was meant to fix.
Four constraints any sensible weight schedule should satisfy:
| # | Constraint | Why |
|---|---|---|
| 1 | monotonic non-increasing in RUL | weight should not flip sign or oscillate as we move toward failure |
| 2 | bounded in [1, 2] | no sample should be IGNORED (w=0) and no sample should DOMINATE (w → ∞) |
| 3 | smooth (continuous + bounded slope) | SGD reads the gradient ∂L/∂pred which carries w. Discontinuous w ⇒ unstable updates |
| 4 | parameter-free apart from R_max | anything else is a hyperparameter that has to be tuned per dataset; the paper avoids tuning |
Five Candidates
Five reasonable schedules, ordered by how well they satisfy the four constraints. Only LINEAR satisfies all four.
| Schedule | Formula | Monotone? | Bounded [1, 2]? | Smooth? | Param-free? |
|---|---|---|---|---|---|
| constant | w(y) = w_0 | trivially | yes (any w_0) | trivially | no (w_0) |
| linear (paper) | w(y) = 1 + clip(1 - y/R_max, 0, 1) | yes | yes | yes (slope = -1/R_max constant) | yes |
| exponential | w(y) = 1 + exp(-β · y/R_max) | yes | no - never reaches 1 floor | yes (peaky) | no (β) |
| sigmoid | w(y) = 1 + σ(-k · (y - y_mid)) | yes | yes (asymptotic) | yes | no (k, y_mid) |
| step | w(y) = 2 if y < τ else 1 | yes | yes | NO (jump at τ) | no (τ) |
Why Linear Wins
On the schedule is . The slope is constant: ≈ −0.008/cycle. That “flat slope” matters because SGD reads the gradient ∂L/∂pred ∝ w(y) - and a constant-slope w gives every cycle the same incremental emphasis. Exponential and sigmoid concentrate emphasis near their peaks, which means SGD's update direction depends on which RUL bin is over-represented in the current batch.
Above the cap () the clip pins the schedule at the floor 1.0 - the same as a healthy engine inside the cap. This matches the §7.2 design decision that engines “far from failure” are operationally equivalent.
Interactive: Schedule Comparison
Pick a schedule on the right; the green dashed line is the paper choice. Notice how exp explodes if you push β up; how sigmoid creates a cliff; how step is two flat plateaus. Only linear stays inside the [1, 2] band with a constant slope.
Try this. Switch to “exp” and slide β to 8 - the curve climbs above w=2 (out of the safe band). Switch to “sigmoid” and crank k to 1 - the curve becomes nearly a step (large slope at y_mid, zero slope elsewhere). Switch to “step” - the slope is infinite at the threshold (the chart shows a vertical line). Only LINEAR keeps the slope bounded everywhere.
Python: Five Schedules Side by Side
All five candidates implemented as pure functions, then evaluated on five test RUL values. The numerical-derivative helper at the bottom verifies that the paper schedule has the most uniform slope.
PyTorch: Paper Form + Ablation Hooks
The paper's exact linear_decay_weight plus an make_amnl closure factory that lets you swap the schedule with one line - perfect for ablations. Smoke test runs all three (linear / exp / step) on identical predictions.
Linear Decay Beyond C-MAPSS
The four constraints are not RUL-specific. Anywhere a regression target has “safe” and “dangerous” regimes with a known reference boundary , linear decay with transfers cleanly.
| Domain | Target y | Reference R_max | Schedule call |
|---|---|---|---|
| RUL prediction (this book) | remaining cycles | 125 cycles (§7.2 cap) | linear_decay_weight(y, max_rul=125) |
| Battery state-of-health | capacity ratio | 1.0 (fresh) | linear_decay_weight(y, max_rul=1.0) |
| Tumour size on follow-up MRI | diameter (mm) | 20 mm (clinical thresh) | linear_decay_weight(20 - y, max_rul=20) |
| Bridge crack length | mm | max safe length | linear_decay_weight(L_safe - y, max_rul=L_safe) |
| Wildfire fuel-moisture | % moisture | ignition threshold | linear_decay_weight(y - thresh, max_rul=thresh) |
| Power-grid frequency deviation | |Hz from nominal| | trip threshold | linear_decay_weight(trip - y, max_rul=trip) |
Three Schedule-Design Pitfalls
clip(min=0), above-cap engines (y > R_max) get NEGATIVE weights. The optimiser would then INCREASE the residual on healthy engines. The clip is what makes the schedule operationally meaningful.The point. Linear decay is not the only schedule that emphasises near-failure samples - but it is the only one that does so without introducing a hyperparameter or destabilising training. §14.3 nails down why the ceiling sits at 2.0 specifically; §14.4 wires the whole thing into the §11.4 DualTaskModel.
Takeaway
- Four constraints. Monotonic, bounded, smooth, parameter-free. Linear is the only candidate that wins on all four.
- Closed form. - paper code, one line.
- Constant slope. on . Same incremental emphasis at every cycle.
- Floor above the cap. The clip turns the algebraic identity into a true schedule.
- Domain-agnostic. Same formula transfers to battery SoH, tumour diameter, crack length, wildfire moisture - flip the sign of (R_max - y) if your dangerous regime is at high y.