Chapter 17
18 min read
Section 84 of 104

Task Weight Analysis: Why 0.5/0.5?

Ablation Studies

Learning Objectives

By the end of this section, you will:

  1. Understand conventional multi-task learning weight selection
  2. Analyze weight experiments across multiple configurations
  3. Discover why 0.5/0.5 weighting outperforms asymmetric schemes
  4. Understand the regularization mechanism of equal weighting
  5. Implement weight ablation experiments systematically
Key Finding: Equal weighting (0.5/0.5) between RUL prediction and health classification outperforms all asymmetric weighting schemes. This contradicts conventional multi-task learning wisdom that primary tasks should receive higher weights than auxiliary tasks.

Conventional Wisdom

Multi-task learning typically assumes the primary task should be weighted more heavily than auxiliary tasks.

Traditional Approach

In standard multi-task learning, the combined loss is typically formulated as:

Lcombined=αLprimary+(1α)Lauxiliary\mathcal{L}_{\text{combined}} = \alpha \cdot \mathcal{L}_{\text{primary}} + (1 - \alpha) \cdot \mathcal{L}_{\text{auxiliary}}

Where α>0.5\alpha > 0.5 is the common choice, based on the reasoning that:

  • The primary task (RUL) is what we ultimately care about
  • Auxiliary tasks provide support but shouldn't dominate
  • Higher weight ensures the model prioritizes primary task optimization

Our V7 Baseline Configuration

ParameterV7 Baseline ValueRationale
RUL Weight (α)0.75Primary task gets majority weight
Health Weight (1-α)0.25Auxiliary task supports learning
Weighting StrategyAsymmetricFollow conventional wisdom

The Surprising Discovery

During systematic ablation studies, we discovered that equal weighting (0.5/0.5) consistently outperformed our carefully tuned asymmetric baseline. This led to the development of AMNL.


Weight Experiments

Systematic evaluation of different task weighting configurations across multiple datasets and seeds.

Experimental Design

ConfigurationRUL WeightHealth WeightDescription
V7 Baseline0.750.25Strong RUL preference
AMNL 0.9/0.10.900.10Maximum RUL preference
AMNL 0.7/0.30.700.30Moderate RUL preference
AMNL 0.6/0.40.600.40Slight RUL preference
AMNL 0.5/0.50.500.50Equal weighting (AMNL)

Results: FD002 (6 Operating Conditions)

ConfigurationRMSEΔ vs V7NASA Score
V7 Baseline (0.75/0.25)9.45498.0
AMNL 0.9/0.111.23-18.8%612.4
AMNL 0.7/0.38.12+14.1%421.3
AMNL 0.6/0.47.45+21.2%389.7
AMNL 0.5/0.56.74+28.7%356.0

Results: FD004 (6 Conditions, 2 Faults)

ConfigurationRMSEΔ vs V7NASA Score
V7 Baseline (0.75/0.25)8.41945.0
AMNL 0.9/0.110.67-26.9%1123.8
AMNL 0.7/0.38.89-5.7%712.4
AMNL 0.6/0.48.34+0.8%623.1
AMNL 0.5/0.58.16+3.0%537.5

Statistical Comparison

ComparisonFD002 Δ RMSEFD004 Δ RMSEp-value
0.5/0.5 vs 0.75/0.25-2.71 (-28.7%)-0.25 (-3.0%)< 0.01
0.5/0.5 vs 0.9/0.1-4.49 (-40.0%)-2.51 (-23.5%)< 0.001
0.5/0.5 vs 0.6/0.4-0.71 (-9.5%)-0.18 (-2.1%)0.034

Statistically Significant

Equal weighting (0.5/0.5) significantly outperforms all asymmetric configurations at p < 0.05. The improvement is largest compared to extreme asymmetric weighting (0.9/0.1).


Why Equal Weighting Works

Three complementary explanations for the surprising success of equal task weighting.

Hypothesis 1: Regularization Effect

Health state classification provides discrete supervision signals that anchor continuous RUL predictions to meaningful degradation stages.

Health State={0if RUL>501if 15<RUL502if RUL15\text{Health State} = \begin{cases} 0 & \text{if RUL} > 50 \\ 1 & \text{if } 15 < \text{RUL} \leq 50 \\ 2 & \text{if RUL} \leq 15 \end{cases}

By forcing the model to correctly classify these discrete states, we implicitly constrain the RUL predictions to be consistent with degradation physics:

  • Healthy predictions must correspond to high RUL values
  • Critical predictions must correspond to low RUL values
  • Transition regions are explicitly supervised

Hypothesis 2: Gradient Balance

Equal weighting maintains gradient balance in shared encoder layers, encouraging features that capture fundamental degradation physics.

θL=0.5θLRUL+0.5θLHealth\nabla_\theta \mathcal{L} = 0.5 \cdot \nabla_\theta \mathcal{L}_{\text{RUL}} + 0.5 \cdot \nabla_\theta \mathcal{L}_{\text{Health}}
WeightingGradient BehaviorEffect
0.9/0.1RUL dominates encoder updatesMay overfit to RUL-specific features
0.75/0.25RUL still dominatesSome regularization from health task
0.5/0.5Balanced gradient flowLearns generalizable features

Hypothesis 3: Implicit Curriculum

The easier health classification task provides an implicit curriculum that stabilizes learning of the harder RUL regression task.

TaskDifficultyConvergence
Health ClassificationEasier (3 classes)Faster, more stable
RUL RegressionHarder (continuous)Slower, less stable

During early training, the health classification task converges first, providing a stable foundation for the shared encoder. This prevents early training instability that can derail RUL learning.

Evidence from Single-Task Failure

The catastrophic failure of single-task RUL prediction (+304.7% degradation, covered in the next section) provides strong evidence for the regularization hypothesis. Without the health task, the model overfits to dataset-specific patterns.


Implementation

Our research ablation study uses systematic configuration management to test different weight combinations.

V7 Baseline Configuration

V7 Baseline Configuration
🐍run_ablation_studies.py
2V7 Baseline

The original training configuration before discovering equal weighting. This serves as the baseline for all ablation comparisons.

3RUL Weight

Primary task receives 75% of the loss contribution - following conventional multi-task learning wisdom.

EXAMPLE
loss = 0.75 * rul_loss + 0.25 * health_loss
4Health Weight

Auxiliary health classification task receives only 25% weight.

5Attention

Multi-head attention is enabled in baseline configuration.

6Weighted MSE

Uses weighted MSE instead of standard MSE for RUL loss.

7Linear Decay

Weight function uses linear decay (not exponential) for stability.

12EMA Enabled

Exponential Moving Average is used for stable weight updates.

7 lines without explanation
1# V7 baseline configuration
2V7_BASELINE_CONFIG = {
3    'amnl_weight_rul': 0.75,
4    'amnl_weight_health': 0.25,
5    'use_attention': True,
6    'use_weighted_mse': True,
7    'weighted_mse_type': 'linear',  # 'linear' or 'exponential'
8    'use_warmup': True,
9    'warmup_epochs': 10,
10    'scheduler_type': 'reduce_on_plateau',  # 'reduce_on_plateau' or 'step'
11    'use_ema': True,
12    'use_adaptive_weight_decay': True,
13    'initial_weight_decay': 1e-4,
14}

Weight Ablation Configurations

AMNL Weight Ablation Configurations
🐍run_ablation_studies.py
2Ablation Dictionary

Each ablation experiment is defined as a dictionary with name, description, and changes from baseline.

4Equal Weighting (AMNL)

The key discovery: equal weighting (0.5/0.5) consistently outperforms asymmetric configurations.

EXAMPLE
loss = 0.5 * rul_loss + 0.5 * health_loss
9Slight RUL Preference

0.6/0.4 is tested to understand the sensitivity curve around equal weighting.

14Strong RUL Preference

0.9/0.1 tests extreme asymmetry - results show this performs worst.

20Single-Task Ablation

Most important ablation: removing health task entirely shows +304.7% degradation on FD002.

21 lines without explanation
1# Define ablation experiments
2ABLATION_CONFIGS = {
3    # Ablation 2: Different AMNL weights
4    'amnl_50_50': {
5        'name': 'AMNL 0.5/0.5',
6        'description': 'Equal weighting for RUL and health tasks',
7        'changes': {'amnl_weight_rul': 0.5, 'amnl_weight_health': 0.5},
8    },
9    'amnl_60_40': {
10        'name': 'AMNL 0.6/0.4',
11        'description': 'Slight RUL preference',
12        'changes': {'amnl_weight_rul': 0.6, 'amnl_weight_health': 0.4},
13    },
14    'amnl_90_10': {
15        'name': 'AMNL 0.9/0.1',
16        'description': 'Strong RUL preference',
17        'changes': {'amnl_weight_rul': 0.9, 'amnl_weight_health': 0.1},
18    },
19
20    # Ablation 1: No dual-task (single-task RUL only)
21    'no_dual_task': {
22        'name': 'Single-Task RUL Only',
23        'description': 'Remove health classification, use only RUL prediction',
24        'changes': {'use_dual_task': False},
25    },
26}

Ablation Training Function

Ablation Training Function
🐍run_ablation_studies.py
1Function Signature

Takes dataset name, seed for reproducibility, configuration dictionary, output directory, and epoch count.

11Config Merging

Ablation changes are merged with baseline - any unspecified parameters use baseline defaults.

EXAMPLE
{**V7_BASELINE_CONFIG, 'amnl_weight_rul': 0.5}
18Dual-Task Check

Determines whether to use dual-task AMNL or single-task model based on ablation config.

20Dual-Task Model

Uses DualTaskEnhancedModel for standard AMNL experiments with both RUL and health heads.

31Single-Task Model

Uses EnhancedSOTATurbofanRULModel for the single-task ablation - no health classification head.

34 lines without explanation
1def train_with_ablation(
2    dataset_name: str,
3    seed: int,
4    config: Dict,
5    output_dir: Path,
6    epochs: int = ABLATION_EPOCHS
7) -> Dict:
8    """Train model with specific ablation configuration."""
9
10    # Merge baseline with ablation changes
11    full_config = {**V7_BASELINE_CONFIG, **config.get('changes', {})}
12
13    # Set seed for reproducibility
14    set_seed(seed)
15    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
16
17    # Determine model type based on configuration
18    use_dual_task = full_config.get('use_dual_task', True)
19
20    if use_dual_task:
21        model = DualTaskEnhancedModel(
22            input_size=17,
23            sequence_length=30,
24            hidden_size=256,
25            num_health_states=3,
26            dropout=dropout,
27            use_attention=full_config['use_attention'],
28            use_residual=True
29        ).to(device)
30    else:
31        # Single-task model for ablation
32        model = EnhancedSOTATurbofanRULModel(
33            input_size=17,
34            sequence_length=30,
35            hidden_size=256,
36            dropout=dropout,
37            use_attention=full_config['use_attention'],
38            use_residual=True
39        ).to(device)

Running All Ablations

Run All Ablations
🐍run_ablation_studies.py
5Statistical Seeds

Three seeds (42, 123, 456) provide statistical robustness for mean and standard deviation calculations.

8Dataset Selection

Focus on FD002 and FD004 - the multi-condition datasets where AMNL shows greatest improvement.

13Total Runs

Calculates total experiments: (baseline + ablations) × datasets × seeds. For 9 ablations, 2 datasets, 3 seeds = 60 runs.

16Baseline First

V7 baseline runs first with empty 'changes' dict - uses all default V7_BASELINE_CONFIG values.

30Ablation Loop

Each ablation configuration runs on all datasets with all seeds for comprehensive comparison.

38Summary Generation

Generates summary tables showing mean ± std for each configuration, plus delta from baseline.

33 lines without explanation
1def run_all_ablations():
2    """Run all ablation experiments."""
3
4    # Ablation seeds (3 seeds for statistical validity)
5    ABLATION_SEEDS = [42, 123, 456]
6
7    # Datasets for ablation (focus on best performers)
8    ABLATION_DATASETS = ['FD002', 'FD004']
9
10    all_results = {}
11
12    # Calculate total runs
13    total_runs = (1 + len(ABLATION_CONFIGS)) * len(ABLATION_DATASETS) * len(ABLATION_SEEDS)
14
15    # Run baseline first
16    print(">>> Running V7 Baseline...")
17    baseline_config = {
18        'name': 'V7 Baseline',
19        'description': 'Full V7 configuration',
20        'changes': {}
21    }
22
23    for dataset in ABLATION_DATASETS:
24        all_results[f'baseline_{dataset}'] = []
25        for seed in ABLATION_SEEDS:
26            result = train_with_ablation(dataset, seed, baseline_config, output_dir)
27            all_results[f'baseline_{dataset}'].append(result)
28
29    # Run each ablation
30    for ablation_key, ablation_config in ABLATION_CONFIGS.items():
31        for dataset in ABLATION_DATASETS:
32            all_results[f'{ablation_key}_{dataset}'] = []
33            for seed in ABLATION_SEEDS:
34                result = train_with_ablation(dataset, seed, ablation_config, output_dir)
35                all_results[f'{ablation_key}_{dataset}'].append(result)
36
37    # Generate summary
38    generate_ablation_summary(all_results)
39    return all_results

Summary

Task Weight Analysis Summary:

  1. Conventional wisdom fails: Giving primary task higher weight is not optimal for RUL prediction
  2. Equal weighting wins: 0.5/0.5 outperforms all asymmetric schemes
  3. Improvement magnitude: Up to 28.7% improvement over 0.75/0.25 baseline
  4. Monotonic trend: Performance improves as health weight increases (up to 0.5)
  5. Three hypotheses: Regularization, gradient balance, implicit curriculum
Key FindingEvidence
0.5/0.5 is optimalBest RMSE on all datasets tested
Asymmetric hurts0.9/0.1 performs 40% worse than 0.5/0.5
Statistically robustp < 0.01 for key comparisons
Works across complexityBoth FD002 and FD004 show same pattern
Key Insight: The success of equal weighting challenges fundamental assumptions in multi-task learning. For predictive maintenance, the auxiliary health classification task is not merely "supportive"—it provides essential regularization that enables learning generalizable degradation features. The next section examines what happens when we remove the health task entirely.

With weight analysis complete, we examine the catastrophic failure of single-task learning.