FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited instruction-following, mathematical reasoning, and code-generation capabilities of small-scale LLMs (e.g., 1B/3B), this work proposes a collaborative knowledge distillation framework leveraging heterogeneous large models (Gemma-2, Mistral, Qwen, Llama, etc.). Methodologically, it introduces: (1) a task-adaptive protocol for constructing heterogeneous knowledge distillation data; and (2) a two-stage training paradigm—supervised fine-tuning (SFT) to align output distributions, followed by multi-source joint direct preference optimization (DPO) to harmonize cross-model preferences. The core innovations are the first-ever multi-source heterogeneous model joint preference optimization mechanism and a domain-aware data synthesis strategy. Experiments show that the distilled Llama-3.1-8B achieves an average +6.8 points across 14 benchmarks, with +37.1 and +30.1 gains on AlpacaEval-2 and Arena-Hard, respectively; lightweight 1B/3B variants also significantly outperform baselines. Code, models, and datasets are fully open-sourced.

Technology Category

Application Category

📝 Abstract
We introduce FuseChat-3.0, a suite of large language models (LLMs) developed by integrating the strengths of heterogeneous source LLMs into more compact target LLMs. Our source models include the powerful Gemma-2-27B-it, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct. For target models, we focus on three widely-used smaller variants-Llama-3.1-8B-Instruct, Gemma-2-9B-it, and Qwen-2.5-7B-Instruct-along with two ultra-compact options, Llama-3.2-3B-Instruct and Llama-3.2-1B-Instruct. To leverage the diverse capabilities of these source models, we develop a specialized data construction protocol tailored to various tasks and domains. The FuseChat-3.0 training pipeline consists of two key stages: (1) supervised fine-tuning (SFT) to align the target and source model distributions, and (2) Direct Preference Optimization (DPO) to apply preferences from multiple source LLMs to fine-tune the target model. The resulting FuseChat-3.0 models exhibit significant performance gains across tasks such as instruction following, general knowledge, mathematics, and coding. As illustrated in Figure 1, using Llama-3.1-8B-Instruct as the target model, our fusion approach achieves an average improvement of 6.8 points across 14 benchmarks. Moreover, it demonstrates remarkable gains of 37.1 points and 30.1 points on the instruction-following benchmarks AlpacaEval-2 and Arena-Hard, respectively. Our code, models, and datasets are available at https://github.com/SLIT-AI/FuseChat-3.0.
Problem

Research questions and friction points this paper is trying to address.

Integrates strengths of large heterogeneous LLMs into compact models.
Develops specialized data construction for diverse tasks and domains.
Improves performance in instruction following, knowledge, math, and coding.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates heterogeneous source LLMs into compact target LLMs
Uses supervised fine-tuning and Direct Preference Optimization
Achieves significant performance gains across multiple benchmarks
🔎 Similar Papers
No similar papers found.