Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance instability in downstream tasks caused by varying domain weights during multi-domain mixed pretraining, this paper proposes a dynamic model instantiation method at inference time that requires no retraining and incurs zero computational overhead. The method introduces a differentiable, domain-weight-driven linear fusion mechanism for expert parameters, enabled by a lightweight coefficient prediction network that generates weighted expert combinations in real time based on input domain weights. An expert parameter repository is constructed and jointly optimized end-to-end with stochastic domain weight sampling, enabling plug-and-play composition of experts under arbitrary domain weight configurations—without increasing model size. Evaluated across multiple language modeling benchmarks, the approach significantly improves domain adaptation efficiency and downstream task performance while introducing no additional inference latency.

Technology Category

Application Category

📝 Abstract
Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of the input domain weights. To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly. Soup-of-Experts are particularly appealing when one needs to ship many different specialist models quickly under a model size constraint.
Problem

Research questions and friction points this paper is trying to address.

Efficient instantiation of domain-specific models
Minimal computational cost without retraining
Specialist models under size constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameters Averaging technique
Bank of expert parameters
Linear combination coefficients
🔎 Similar Papers
No similar papers found.