π€ AI Summary
This work addresses the limitations of existing large language model fusion approaches, which rely on handcrafted stochastic operators and struggle to effectively explore the performance landscape of fusion coefficient space. The authors propose EvoGM, an evolutionary generative fusion framework that introduces learnable generative modeling into evolutionary model fusion for the first time. EvoGM employs a dual-generator architecture with cycle-consistency learning to adaptively sample and refine high-potential fusion candidates. By constructing win-loss pairs from historical search trajectories, it efficiently models the distribution of high-performing parameters, eliminating reliance on manual heuristics. Integrated with a multi-round elite iteration mechanism, EvoGM significantly outperforms current methods across diverse benchmark tasks, demonstrating consistently superior performance on both seen and unseen tasks.
π Abstract
Evolutionary model merging provides a powerful framework for the automated, training-free composition of LLMs through parameter-space search. However, existing methods predominantly rely on stochastic, hand-crafted operators that overlook the underlying performance landscape of the coefficient space. We propose Evolutionary Generative Merging (EvoGM), a framework that transcends manual heuristics by employing learnable generative modeling to optimize merging coefficients. Specifically, EvoGM features a dual-generator architecture with cycle-consistent learning to adaptively sample and refine promising merging candidates. By constructing winner-loser pairs from historical search trajectories, our framework effectively captures high-performance parameter distributions and maximizes data efficiency. This generative process is seamlessly integrated into a multi-round evolutionary pipeline, where elite merged models iteratively serve as new expert foundations. Extensive experiments across diverse benchmarks demonstrate that EvoGM significantly outperforms state-of-the-art baselines, exhibiting robust performance on both seen and unseen tasks. Code and data are available at https://github.com/JiangTao97/evogm.