🤖 AI Summary
Enhancing large language models’ (LLMs) reasoning capabilities typically relies on extensive proprietary data and computational resources; moreover, existing model merging methods require manual hyperparameter design, entail high exploration costs, and suffer from limited generalization. Method: This paper proposes the first automated model fusion framework, introducing two novel search spaces—Layer-wise Fusion Search (LFS) and Depth-level Integration Search (DIS)—and integrating multi-fidelity optimization, Bayesian-guided multi-objective evolutionary search, and a lightweight validation surrogate model to discover fine-grained, zero-retraining fusion strategies within <500 iterations. Contribution/Results: The automatically discovered fusion policies consistently outperform individual models and hand-crafted merging baselines across multiple benchmarks. They not only improve performance on downstream fine-tuned tasks but also, for the first time, construct a cross-task Pareto-optimal frontier, demonstrating superior generalization and efficiency.
📝 Abstract
Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.