🤖 AI Summary
This work addresses the reliance on exhaustive search for linear scalarization weights in multi-task learning. We propose AutoScale, the first framework to theoretically link scalarization weights with multi-task optimization metrics—specifically gradient magnitude similarity and task loss dynamics. AutoScale employs a two-stage mechanism: (1) quantifying inter-task gradient conflict to assess task compatibility, and (2) dynamically adjusting weights based on per-task loss change rates. Crucially, it requires no hyperparameter tuning and substantially reduces computational overhead. Evaluated across multiple standard benchmarks, AutoScale consistently outperforms state-of-the-art methods in both convergence speed and final performance. Its design ensures broad applicability across diverse architectures and task configurations, while maintaining training stability and scalability to large-scale multi-task settings.
📝 Abstract
Recent multi-task learning studies suggest that linear scalarization, when using well-chosen fixed task weights, can achieve comparable to or even better performance than complex multi-task optimization (MTO) methods. It remains unclear why certain weights yield optimal performance and how to determine these weights without relying on exhaustive hyperparameter search. This paper establishes a direct connection between linear scalarization and MTO methods, revealing through extensive experiments that well-performing scalarization weights exhibit specific trends in key MTO metrics, such as high gradient magnitude similarity. Building on this insight, we introduce AutoScale, a simple yet effective two-phase framework that uses these MTO metrics to guide weight selection for linear scalarization, without expensive weight search. AutoScale consistently shows superior performance with high efficiency across diverse datasets including a new large-scale benchmark.