LLMBoost: Make Large Language Models Stronger with Boosting

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM ensemble methods typically treat constituent models as black boxes, performing only input or output fusion while neglecting intermediate representations and inter-model interactions. To address this, we propose LLMBoost—a boosting-inspired fine-tuning framework for LLM ensembles. Our key contributions are: (1) cross-model attention, which explicitly models interaction among heterogeneous models’ intermediate representations; (2) a chained error-suppression training paradigm that enables hierarchical error correction and knowledge transfer; and (3) layer-wise state pipelined inference, balancing efficiency and performance. We theoretically prove that sequential ensembling guarantees monotonic performance improvement. Experiments demonstrate that LLMBoost consistently improves accuracy on commonsense and arithmetic reasoning benchmarks while reducing inference latency—achieving efficiency comparable to single-model decoding.

Technology Category

Application Category

📝 Abstract
Ensemble learning of LLMs has emerged as a promising alternative to enhance performance, but existing approaches typically treat models as black boxes, combining the inputs or final outputs while overlooking the rich internal representations and interactions across models.In this work, we introduce LLMBoost, a novel ensemble fine-tuning framework that breaks this barrier by explicitly leveraging intermediate states of LLMs. Inspired by the boosting paradigm, LLMBoost incorporates three key innovations. First, a cross-model attention mechanism enables successor models to access and fuse hidden states from predecessors, facilitating hierarchical error correction and knowledge transfer. Second, a chain training paradigm progressively fine-tunes connected models with an error-suppression objective, ensuring that each model rectifies the mispredictions of its predecessor with minimal additional computation. Third, a near-parallel inference paradigm design pipelines hidden states across models layer by layer, achieving inference efficiency approaching single-model decoding. We further establish the theoretical foundations of LLMBoost, proving that sequential integration guarantees monotonic improvements under bounded correction assumptions. Extensive experiments on commonsense reasoning and arithmetic reasoning tasks demonstrate that LLMBoost consistently boosts accuracy while reducing inference latency.
Problem

Research questions and friction points this paper is trying to address.

Enhances LLM performance via ensemble fine-tuning with intermediate states
Introduces cross-model attention for hierarchical error correction and transfer
Achieves efficient inference with near-parallel decoding and reduced latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-model attention for hierarchical error correction
Chain training with error-suppression objective
Near-parallel inference for efficient decoding
🔎 Similar Papers
No similar papers found.
Zehao Chen
Zehao Chen
PhD, Yale University
Porous MediaFluid DynamicsPolymerHydrogel
T
Tianxiang Ai
China Telecom eSurfing Cloud
Y
Yifei Li
Beihang University
G
Gongxun Li
Beihang University
Y
Yuyang Wei
China Telecom eSurfing Cloud
Wang Zhou
Wang Zhou
Sun Yat-Sen University
G
Guanghui Li
China Telecom eSurfing Cloud
B
Bin Yu
China Telecom eSurfing Cloud
Zhijun Chen
Zhijun Chen
Beihang University
Machine LearningNature Language Processing
Hailong Sun
Hailong Sun
Professor of Computer Science, Beihang University
Software EngineeringArtificial IntelligenceSoftware Systems
F
Fuzhen Zhuang
Beihang University
J
Jianxin Li
Beihang University
D
Deqing Wang
Beihang University
Yikun Ban
Yikun Ban
Beihang University, University of Illinois Urbana-Champaign
Reinforcement LearningEnsemble Learning