🤖 AI Summary
Multi-task learning (MTL) often suffers from “optimization imbalance,” where gradient interference among tasks degrades performance below that of single-task baselines. This work systematically identifies inter-task gradient norm disparity as the primary cause. To address it, we propose a gradient-aware loss scaling strategy: dynamically adjusting task-specific loss weights according to their respective gradient norms—requiring no architectural modifications or costly hyperparameter grid search. Initialized with vision foundation models, we conduct cross-task empirical analysis across multiple benchmarks, demonstrating that our method significantly mitigates task interference, matches the performance of optimally hand-tuned baselines, and exhibits strong generalization. Our core contribution is establishing a quantitative link between gradient dynamics and optimization imbalance, and delivering a simple, efficient, plug-and-play solution for robust MTL optimization.
📝 Abstract
Multi-task learning (MTL) aims to build general-purpose vision systems by training a single network to perform multiple tasks jointly. While promising, its potential is often hindered by "unbalanced optimization", where task interference leads to subpar performance compared to single-task models. To facilitate research in MTL, this paper presents a systematic experimental analysis to dissect the factors contributing to this persistent problem. Our investigation confirms that the performance of existing optimization methods varies inconsistently across datasets, and advanced architectures still rely on costly grid-searched loss weights. Furthermore, we show that while powerful Vision Foundation Models (VFMs) provide strong initialization, they do not inherently resolve the optimization imbalance, and merely increasing data quantity offers limited benefits. A crucial finding emerges from our analysis: a strong correlation exists between the optimization imbalance and the norm of task-specific gradients. We demonstrate that this insight is directly applicable, showing that a straightforward strategy of scaling task losses according to their gradient norms can achieve performance comparable to that of an extensive and computationally expensive grid search. Our comprehensive analysis suggests that understanding and controlling gradient dynamics is a more direct path to stable MTL than developing increasingly complex methods.