Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the challenge of excessive storage overhead in dynamic model merging for multi-task adaptation by introducing a learnable framework for task vector compression. Departing from static, rule-based approaches, the proposed method incorporates three core mechanisms: Learnable Gating-based Sparsification (LGS), Bit-width Adaptive Selection (BAS), and a Sparsity-Aware Storage Strategy (SASS). These components jointly enable adaptive compression while preserving task-specific information. Furthermore, the approach integrates a k-nearest neighbor (KNN) inference module guided by a learnable low-rank metric to facilitate efficient retrieval and merging of compressed task vectors. The resulting system achieves high-fidelity multi-task performance under significantly reduced storage costs, demonstrating both high compression ratios and computational efficiency in dynamic model merging scenarios.

📝 Abstract

Model merging has attracted attention as an effective path toward multi-task adaptation by integrating knowledge from multiple task-specific models. Among existing approaches, dynamic merging mitigates performance degradation caused by conflicting parameter updates across tasks by flexibly combining task-specific parameters at inference time, thereby maintaining high performance. However, these methods require storing independent parameters for each task, resulting in prohibitive storage overhead. To address this issue, we first experimentally demonstrate that the fine-tuned weight increments (referred to as task vectors) exhibit an impulse-like activation pattern and high robustness to low-bit representations. Driven by this insight, we propose T-Switch, which decomposes task vectors into three compact components: a binary sparse mask, a sign vector, and a scalar scaling factor, achieving high-fidelity approximation at high compression ratios. We then introduce Auto-Switch, a training-free merging scheme that automatically composes task vectors via feature similarity retrieval. Building on this, we develop Auto-Switch, a training-free merging scheme that automatically assembles task vectors through feature similarity retrieval. Furthermore, to transform task vector sparsification and quantization from static rules to adaptive learning, we propose FlexSwitch, a learnable framework which jointly optimizes the compression strategy for each model unit via Learnable Gating Sparsification (LGS) and Bit-width Adaptive Selection (BAS), while employing the Sparsity-Aware Storage Strategy (SASS) to select the optimal storage encoding structure. Finally, by incorporating a K-Nearest Neighbor (KNN) inference scheme with a learnable low-rank metric, we present Auto-FlexSwitch, a dynamic model merging approach that supports highly efficient task vector compression.

Problem

Research questions and friction points this paper is trying to address.

model merging

storage overhead

task vectors

dynamic merging

multi-task adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

task vector compression

dynamic model merging

learnable sparsification