Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enhancing large language models’ (LLMs) reasoning capabilities typically relies on extensive proprietary data and computational resources; moreover, existing model merging methods require manual hyperparameter design, entail high exploration costs, and suffer from limited generalization. Method: This paper proposes the first automated model fusion framework, introducing two novel search spaces—Layer-wise Fusion Search (LFS) and Depth-level Integration Search (DIS)—and integrating multi-fidelity optimization, Bayesian-guided multi-objective evolutionary search, and a lightweight validation surrogate model to discover fine-grained, zero-retraining fusion strategies within <500 iterations. Contribution/Results: The automatically discovered fusion policies consistently outperform individual models and hand-crafted merging baselines across multiple benchmarks. They not only improve performance on downstream fine-tuned tasks but also, for the first time, construct a cross-task Pareto-optimal frontier, demonstrating superior generalization and efficiency.

Technology Category

Application Category

📝 Abstract
Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.
Problem

Research questions and friction points this paper is trying to address.

Automated merging of large language models
Reduces reliance on manual hyperparameter tuning
Enhances multi-objective optimization across tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Model Merging Framework
Multi-fidelity approximations
Layerwise and depth-wise integration
🔎 Similar Papers
No similar papers found.