Pareto, Cars, And Why You Cannot Have It All
Vilfredo Pareto, the Italian economist, noticed in 1896 that a policy reform that helps everyone equally is rare; what is far more common is a reform that helps some at the expense of others. The configurations where you cannot improve any individual's outcome without making someone else worse off are Pareto-optimal — the boundary of the attainable region.
Engineering knows this picture well. Tune an F1-car suspension for grip and you lose top speed; tune for top speed and you lose grip. The set of suspensions where no other configuration is BOTH grippier AND faster is the Pareto frontier. A team can choose a point on the frontier based on the day's circuit but cannot beat the frontier without changing something outside the suspension — a different tyre compound, a different aerodynamic package, a different driver.
Section 23·1 reported one number: GRACE wins NASA at 232.7 on multi-condition C-MAPSS. This section shows the picture underneath that number. RMSE and NASA define a 2D performance space; the 9 published MTL methods scatter inside it; only 3 are on the Pareto frontier; GRACE is one of them. The plot is an honest record of which methods to choose, when, and what you get up.
Pareto Dominance In Two Lines
Two-axis Pareto dominance is a precise relation. For two candidate methods and with metrics and where lower is better on both axes:
Read this as ‘ dominates ’: q is no worse on either axis AND strictly better on at least one. The strict-better clause excludes ties — if exactly, neither dominates the other and both are candidates for the front. The Pareto frontier is the set of methods that no other method dominates.
Interactive: The Multi-Condition Pareto Picture
Below: 9 methods plotted in (RMSE, NASA) space, multi-condition mean. The dashed green line traces the Pareto frontier. Drag the preference slider to compose RMSE and NASA into a single score and watch the optimal method change. Toggle FD002 / FD004 / multi-cond mean above the chart.
Two things to notice. Drag from 0 to 1: the optimal method walks along the frontier from GRACE (safety-only) through GABA (balanced) to AMNL (accuracy-only). Switch from MULTI to FD002: the frontier gains Baseline as a fourth corner. Switch to FD004: the frontier collapses to GradNorm alone — one method dominates everything.
Three Methods Make The Multi-Condition Front
| Front method | RMSE | NASA | Role |
|---|---|---|---|
| AMNL | 7.45 (best) | 446.7 (worst) | Accuracy-only — buy 0.5 RMSE with +200 NASA |
| GABA | 7.89 | 235.7 | Balanced — pays only 0.44 RMSE for NASA −211 |
| GRACE | 7.92 | 232.7 (best) | Safety-best — pays only 0.03 RMSE for NASA −3 |
The three corners answer different questions. AMNL says: you only care about accuracy (e.g. an HM benchmark scoreboard). GABA says: you accept some accuracy cost to keep the safety score reasonable. GRACE says: safety dominates — buy every possible NASA point, and only stop when accuracy starts to regress.
The 6 dominated methods (Baseline, DWA, GradNorm, Uncertainty, PCGrad, CAGrad) all have at least one front method that beats them on both axes. Uncertainty (RMSE 7.98, NASA 233.8) is the closest to the front: GRACE beats it by 0.06 RMSE and 1.1 NASA — within seed noise. The published claim is therefore ‘GRACE strictly dominates Uncertainty’ with a small statistical margin; section 22·3 §5 explained why this is robust at n=5 seeds.
FD002 Has A Four-Method Front
| Front method | RMSE | NASA | Position |
|---|---|---|---|
| AMNL | 6.74 (best) | 356.0 | Accuracy corner — costs +131 NASA |
| Baseline | 7.37 | 224.5 | Cheap balanced point |
| GABA | 7.53 | 224.2 | Middle of the front |
| GRACE | 7.72 | 223.4 (best) | Safety corner |
FD002 is the ‘easy’ multi-condition subset (single fault mode) and the front widens to four methods. Baseline appears here because plain MTL with fixed equal weights already does a reasonable job balancing RMSE and NASA when the data only has one fault to learn. GRACE still wins NASA, GABA is one tick higher in NASA but lower in RMSE, AMNL stretches all the way to the accuracy corner.
FD004 Has A Single Winner
| Front method | RMSE | NASA | Status |
|---|---|---|---|
| GradNorm | 7.74 (best) | 222.9 (best) | Dominates every other method on FD004 |
FD004 (multi-condition multi-fault — the hardest C-MAPSS subset) produces a singular winner: GradNorm beats every other method on BOTH axes. GRACE comes second on RMSE (8.12) and second on NASA (242.0) but is dominated by GradNorm. The reason GRACE still wins the multi-condition mean is its FD002 dominance: the 232.7 multi-cond NASA = mean(223.4, 242.0) is lower than any other method's mean of the same pair.
Picking A Point With A Preference Weight
For deployment you need to pick ONE method, not the whole front. The textbook way: define a preference weight capturing how much you care about RMSE relative to NASA, then minimise the composite score
where the tildes denote min-max normalisation onto (so the two axes can be added without unit-scale issues). At only NASA matters → GRACE wins. At only RMSE matters → AMNL wins. In between, the optimum walks along the front.
| λ range | Optimal method | Operational reading |
|---|---|---|
| [0.00, 0.40] | GRACE | Safety-first deployment. Aviation, nuclear, medical. |
| [0.40, 0.85] | GABA | Balanced. General industrial PHM. |
| [0.85, 1.00] | AMNL | Accuracy-first. Benchmark leaderboards, research demos. |
The cut-points are dataset-specific (the explorer above recomputes them live), but the qualitative picture — GRACE for safety-heavy weights, AMNL for accuracy-heavy weights, GABA in between — is robust across all four C-MAPSS subsets and the N-CMAPSS DS02 set covered in chapter 23·3.
Python: Pareto Sorting From Scratch
Implement Pareto-front identification on the 9-method results matrix. The algorithm is two nested loops — O(N²) which is plenty for N=9. Larger problems use NSGA-II's fast-non-dominated-sort which is O(N log N), but the definitions are the same.
PyTorch: Hooking RMSE+NASA Into An Eval Loop
Production usage: take a trained model, run the paper's Evaluator, place the result on the published Pareto picture, and decide whether the new model is publishable. The wrapper returns three signals: the raw pair, a boolean ‘on Pareto front vs paper’, and the lists of paper methods that dominate or are dominated.
Pareto Tradeoffs In Other Domains
| Domain | Axis 1 (typical accuracy) | Axis 2 (operational) | Pareto winner pattern |
|---|---|---|---|
| Object detection | mAP @ IoU=0.5 | Inference latency (ms/frame) | Family of detectors (YOLO, EfficientDet, RT-DETR) trace a clean Pareto curve; pick by FPS budget. |
| LLM serving | MMLU / HumanEval score | Tokens/sec at fixed cost | Larger models dominate on quality, smaller dominate on speed; quantisation moves a model along the frontier. |
| Medical imaging | Dice / sensitivity | False-positive rate | Sensitivity-specificity Pareto via threshold sweep — every classifier has its own front. |
| Energy storage controllers | State-of-charge accuracy | Cell aging cost (calendar + cycle) | Aggressive controllers dominate on accuracy, gentle on aging — operator picks based on lifecycle economics. |
| Recommender systems | CTR / conversion rate | Catalog diversity / fairness | Greedy click-maximisers dominate on CTR and lose diversity; calibrated rankers Pareto-improve along both. |
In every row the ‘publish your Pareto front, not your winner’ norm is becoming standard. Single-number comparisons (mAP, MMLU, Dice) hide deployment trade-offs the practitioner needs to see. GRACE's contribution to the C-MAPSS conversation is the same: shift the literature from ‘best RMSE wins’ to ‘here is the Pareto picture, here is where we sit, here is what you give up if you choose differently’.
Pitfalls When Reading A Pareto Plot
Pitfall 1: confusing ‘on the front’ with ‘the winner’
AMNL is on the front because nothing dominates it on RMSE; that does NOT make AMNL a good general-purpose method. Its NASA score is twice GRACE's. Being on the front means ‘optimal for SOME preference’, not ‘best overall’.
Pitfall 2: averaging across datasets before Pareto-ising
The multi-condition mean has 3 methods on the front; FD004 alone has 1 (GradNorm), FD002 alone has 4. Per-dataset and averaged fronts can disagree dramatically. Always report per-dataset Pareto pictures alongside the averaged one — the averaging hides reversals.
Pitfall 3: ignoring seed variance
On FD002 GRACE's NASA = 223.4 ± 26.5 (one-sigma) and Uncertainty's NASA = 224.4 ± 35.3. The two error bars overlap; the ‘GRACE strictly dominates Uncertainty’ claim is a POINT-ESTIMATE claim, not a statistical one. Always report SEM bars on Pareto plots; the paper's Fig. 3 does.
Pitfall 4: choosing λ post-hoc
Picking the preference weight AFTER seeing the results — e.g. ‘ happens to make my method win’ — is post-hoc cherry-picking. The must come from the deployment context (operational cost ratios, regulatory requirements) not from a sweep that targets a desired winner.
Pitfall 5: forgetting that lower is better on both axes here
Some Pareto plots in the literature put accuracy on the x-axis (higher better) and cost on the y-axis (lower better). The frontier then sits in the upper-right. The C-MAPSS Pareto plot has both axes in ‘lower better’ orientation; the frontier sits in the lower-left. Read the axis legends before interpreting; mirroring an axis silently inverts the front.
Takeaway
- Pareto dominance is precise: iff q is no worse on either axis AND strictly better on at least one. The Pareto front is the set of methods that no other method dominates.
- On multi-condition C-MAPSS the front has 3 methods: AMNL (accuracy corner), GABA (middle), GRACE (safety corner). On FD002 alone the front widens to 4 (Baseline added). On FD004 alone it collapses to 1 (GradNorm dominates everything).
- GRACE strictly dominates 6 of the 8 other methods on multi-condition mean: Baseline, DWA, GradNorm, Uncertainty, PCGrad, CAGrad. It does NOT dominate AMNL or GABA — those have lower RMSE.
- Picking a deployment point uses a preference weight : GRACE for (safety-first), GABA for the middle, AMNL for (accuracy-first).
- The Pareto-front norm is now standard across deep learning: object detection, LLM serving, medical imaging, energy, recommenders. Always publish the front, not the winner.