🤖 AI Summary
In multi-model compression, merging models via task-specific parameter deltas often causes cross-task interference due to delta superposition, degrading performance. To address this, we propose a random orthogonal transformation-based approach for delta decorrelation and self-cancellation. We reformulate model merging as a “compression–retrieval” paradigm: task-specific deltas are projected via seed-controllable random orthogonal matrices, enabling lossless spatial decoupling, statistical decorrelation, and dynamic self-cancellation among deltas. Our method requires no auxiliary storage, training data, or architectural modifications, and supports zero-overhead dynamic model insertion and removal. Evaluated on vision and language multi-task benchmarks, it significantly outperforms existing merging techniques—reducing cross-task interference markedly, incurring zero memory overhead for newly added models, and enabling real-time model management with low computational cost.
📝 Abstract
From a multi-model compression perspective, model merging enables memory-efficient serving of multiple models fine-tuned from the same base, but suffers from degraded performance due to interference among their task-specific parameter adjustments (i.e., deltas). In this paper, we reformulate model merging as a compress-and-retrieve scheme, revealing that the task interference arises from the summation of irrelevant deltas during model retrieval. To address this issue, we use random orthogonal transformations to decorrelate these vectors into self-cancellation. We show that this approach drastically reduces interference, improving performance across both vision and language tasks. Since these transformations are fully defined by random seeds, adding new models requires no extra memory. Further, their data- and model-agnostic nature enables easy addition or removal of models with minimal compute overhead, supporting efficient and flexible multi-model serving.