Can Optimization Trajectories Explain Multi-Task Transfer?

📅 2024-08-26

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work investigates the mechanisms underlying generalization degradation in multi-task learning (MTL), specifically why mainstream optimization algorithms often fail to improve generalization. Through controlled experiments on deep neural networks, quantitative measurement of gradient conflicts, visualization of optimization trajectories, and attribution analysis of generalization error, we empirically establish— for the first time—that a substantial generalization gap emerges early in multi-task training, and crucially, this gap is decoupled from the degree of gradient conflict: while gradient conflicts shape task-specific optimization dynamics, they do not predict generalization performance. This finding challenges the prevailing MTL design paradigm that assumes mitigating gradient conflict is a necessary condition for improving generalization. Our primary contributions are: (i) identifying the early onset and independence of MTL generalization failure; (ii) refuting gradient conflict as a valid proxy for generalization; and (iii) providing new theoretical foundations for understanding task synergy and developing robust multi-task optimizers.

Technology Category

Application Category

📝 Abstract

Despite the widespread adoption of multi-task training in deep learning, little is understood about how multi-task learning (MTL) affects generalization. Prior work has conjectured that the negative effects of MTL are due to optimization challenges that arise during training, and many optimization methods have been proposed to improve multi-task performance. However, recent work has shown that these methods fail to consistently improve multi-task generalization. In this work, we seek to improve our understanding of these failures by empirically studying how MTL impacts the optimization of tasks, and whether this impact can explain the effects of MTL on generalization. We show that MTL results in a generalization gap (a gap in generalization at comparable training loss) between single-task and multi-task trajectories early into training. However, we find that factors of the optimization trajectory previously proposed to explain generalization gaps in single-task settings cannot explain the generalization gaps between single-task and multi-task models. Moreover, we show that the amount of gradient conflict between tasks is correlated with negative effects to task optimization, but is not predictive of generalization. Our work sheds light on the underlying causes for failures in MTL and, importantly, raises questions about the role of general purpose multi-task optimization algorithms.

Problem

Research questions and friction points this paper is trying to address.

Multi-task Learning

Skill Transfer

Optimization Algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multitask Learning

Contradictory Early-stage Performance

Conflicting Task Impact

🔎 Similar Papers

Towards Principled Task Grouping for Multi-Task Learning