Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

📅 2025-02-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Enhancing large language models’ (LLMs) reasoning capabilities typically relies on extensive proprietary data and computational resources; moreover, existing model merging methods require manual hyperparameter design, entail high exploration costs, and suffer from limited generalization. Method: This paper proposes the first automated model fusion framework, introducing two novel search spaces—Layer-wise Fusion Search (LFS) and Depth-level Integration Search (DIS)—and integrating multi-fidelity optimization, Bayesian-guided multi-objective evolutionary search, and a lightweight validation surrogate model to discover fine-grained, zero-retraining fusion strategies within <500 iterations. Contribution/Results: The automatically discovered fusion policies consistently outperform individual models and hand-crafted merging baselines across multiple benchmarks. They not only improve performance on downstream fine-tuned tasks but also, for the first time, construct a cross-task Pareto-optimal frontier, demonstrating superior generalization and efficiency.

Technology Category

Application Category

📝 Abstract

Reasoning capabilities represent a critical frontier for large language models (LLMs), but developing them requires extensive proprietary datasets and computational resources. One way to efficiently supplement capabilities with is by model merging, which offers a promising alternative by combining multiple models without retraining. However, current merging approaches rely on manually-designed strategies for merging hyperparameters, limiting the exploration of potential model combinations and requiring significant human effort. We propose an Automated Model Merging Framework that enables fine-grained exploration of merging strategies while reducing costs through multi-fidelity approximations. We support both single and multi-objective optimization and introduce two novel search spaces: layerwise fusion (LFS) and depth-wise integration (DIS). Evaluating across a number of benchmarks, we find that the search autonomously finds 1) Merges that further boost single-objective performance, even on tasks the model has already been finetuned on, and 2) Merges that optimize multi-objective frontiers across tasks. Effective merges are found with limited compute, e.g. within less than 500 search steps.

Problem

Research questions and friction points this paper is trying to address.

Automated merging of large language models

Reduces reliance on manual hyperparameter tuning

Enhances multi-objective optimization across tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Model Merging Framework

Multi-fidelity approximations

Layerwise and depth-wise integration

🔎 Similar Papers

No similar papers found.

Authors to Follow