Revisit the Imbalance Optimization in Multi-task Learning: An Experimental Analysis

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multi-task learning (MTL) often suffers from “optimization imbalance,” where gradient interference among tasks degrades performance below that of single-task baselines. This work systematically identifies inter-task gradient norm disparity as the primary cause. To address it, we propose a gradient-aware loss scaling strategy: dynamically adjusting task-specific loss weights according to their respective gradient norms—requiring no architectural modifications or costly hyperparameter grid search. Initialized with vision foundation models, we conduct cross-task empirical analysis across multiple benchmarks, demonstrating that our method significantly mitigates task interference, matches the performance of optimally hand-tuned baselines, and exhibits strong generalization. Our core contribution is establishing a quantitative link between gradient dynamics and optimization imbalance, and delivering a simple, efficient, plug-and-play solution for robust MTL optimization.

Technology Category

Application Category

📝 Abstract
Multi-task learning (MTL) aims to build general-purpose vision systems by training a single network to perform multiple tasks jointly. While promising, its potential is often hindered by "unbalanced optimization", where task interference leads to subpar performance compared to single-task models. To facilitate research in MTL, this paper presents a systematic experimental analysis to dissect the factors contributing to this persistent problem. Our investigation confirms that the performance of existing optimization methods varies inconsistently across datasets, and advanced architectures still rely on costly grid-searched loss weights. Furthermore, we show that while powerful Vision Foundation Models (VFMs) provide strong initialization, they do not inherently resolve the optimization imbalance, and merely increasing data quantity offers limited benefits. A crucial finding emerges from our analysis: a strong correlation exists between the optimization imbalance and the norm of task-specific gradients. We demonstrate that this insight is directly applicable, showing that a straightforward strategy of scaling task losses according to their gradient norms can achieve performance comparable to that of an extensive and computationally expensive grid search. Our comprehensive analysis suggests that understanding and controlling gradient dynamics is a more direct path to stable MTL than developing increasingly complex methods.
Problem

Research questions and friction points this paper is trying to address.

Analyzing optimization imbalance in multi-task learning systems
Investigating gradient norm correlation with task interference issues
Developing efficient loss scaling strategies without grid search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaling task losses by gradient norms
Analyzing correlation between imbalance and gradients
Controlling gradient dynamics for stable optimization
🔎 Similar Papers
No similar papers found.
Y
Yihang Guo
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
T
Tianyuan Yu
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
L
Liang Bai
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
Yanming Guo
Yanming Guo
National University of Defense Technology
deep learningcomputer vision
Y
Yirun Ruan
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
W
William Li
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
W
Weishi Zheng
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China, and also with Peng Cheng Laboratory, Shenzhen 518005, China