AutoScale: Linear Scalarization Guided by Multi-Task Optimization Metrics

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the reliance on exhaustive search for linear scalarization weights in multi-task learning. We propose AutoScale, the first framework to theoretically link scalarization weights with multi-task optimization metrics—specifically gradient magnitude similarity and task loss dynamics. AutoScale employs a two-stage mechanism: (1) quantifying inter-task gradient conflict to assess task compatibility, and (2) dynamically adjusting weights based on per-task loss change rates. Crucially, it requires no hyperparameter tuning and substantially reduces computational overhead. Evaluated across multiple standard benchmarks, AutoScale consistently outperforms state-of-the-art methods in both convergence speed and final performance. Its design ensures broad applicability across diverse architectures and task configurations, while maintaining training stability and scalability to large-scale multi-task settings.

Technology Category

Application Category

📝 Abstract

Recent multi-task learning studies suggest that linear scalarization, when using well-chosen fixed task weights, can achieve comparable to or even better performance than complex multi-task optimization (MTO) methods. It remains unclear why certain weights yield optimal performance and how to determine these weights without relying on exhaustive hyperparameter search. This paper establishes a direct connection between linear scalarization and MTO methods, revealing through extensive experiments that well-performing scalarization weights exhibit specific trends in key MTO metrics, such as high gradient magnitude similarity. Building on this insight, we introduce AutoScale, a simple yet effective two-phase framework that uses these MTO metrics to guide weight selection for linear scalarization, without expensive weight search. AutoScale consistently shows superior performance with high efficiency across diverse datasets including a new large-scale benchmark.

Problem

Research questions and friction points this paper is trying to address.

Determining optimal weights for linear scalarization without exhaustive search

Understanding why certain weights yield superior multi-task performance

Automating weight selection using multi-task optimization metrics guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear scalarization guided by MTO metrics

Two-phase framework without expensive weight search

Uses gradient similarity trends for weight selection

🔎 Similar Papers

AutoScale: Scale-Aware Data Mixing for Pre-Training LLMs