RanDeS: Randomized Delta Superposition for Multi-Model Compression

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

In multi-model compression, merging models via task-specific parameter deltas often causes cross-task interference due to delta superposition, degrading performance. To address this, we propose a random orthogonal transformation-based approach for delta decorrelation and self-cancellation. We reformulate model merging as a “compression–retrieval” paradigm: task-specific deltas are projected via seed-controllable random orthogonal matrices, enabling lossless spatial decoupling, statistical decorrelation, and dynamic self-cancellation among deltas. Our method requires no auxiliary storage, training data, or architectural modifications, and supports zero-overhead dynamic model insertion and removal. Evaluated on vision and language multi-task benchmarks, it significantly outperforms existing merging techniques—reducing cross-task interference markedly, incurring zero memory overhead for newly added models, and enabling real-time model management with low computational cost.

Technology Category

Application Category

📝 Abstract

From a multi-model compression perspective, model merging enables memory-efficient serving of multiple models fine-tuned from the same base, but suffers from degraded performance due to interference among their task-specific parameter adjustments (i.e., deltas). In this paper, we reformulate model merging as a compress-and-retrieve scheme, revealing that the task interference arises from the summation of irrelevant deltas during model retrieval. To address this issue, we use random orthogonal transformations to decorrelate these vectors into self-cancellation. We show that this approach drastically reduces interference, improving performance across both vision and language tasks. Since these transformations are fully defined by random seeds, adding new models requires no extra memory. Further, their data- and model-agnostic nature enables easy addition or removal of models with minimal compute overhead, supporting efficient and flexible multi-model serving.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in multi-model merging due to task interference

Proposes randomized orthogonal transformations to decorrelate interfering parameter adjustments

Enables memory-efficient serving of multiple models without extra storage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Random orthogonal transformations decorrelate interference vectors

Compress-and-retrieve scheme minimizes task interference

Memory-efficient multi-model serving with random seeds

🔎 Similar Papers

Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data