AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the limitation of existing large language model ensemble methods, which rely on fixed fusion granularities and thus struggle to dynamically accommodate the varying demands of different tasks for semantic expressiveness and uncertainty handling. To overcome this, we propose AdaFuse, a novel framework that adaptively selects fusion units during decoding based on contextual cues. AdaFuse introduces an uncertainty-driven ensemble triggering mechanism and a diversity-aware test-time scaling strategy, enabling dynamic adjustment of fusion granularity at generation time for the first time. Extensive experiments demonstrate that AdaFuse achieves an average relative improvement of 6.88% over strong baseline ensemble methods across open-domain question answering, arithmetic reasoning, and machine translation tasks, significantly outperforming current approaches.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a practical way to combine these capabilities without retraining. However, existing ensemble approaches suffer from fundamental limitations. Most rely on fixed fusion granularity, which lacks the flexibility required for mid-generation adaptation and fails to adapt to different generation characteristics across tasks. To address these challenges, we propose AdaFuse, an adaptive ensemble decoding framework that dynamically selects semantically appropriate fusion units during generation. Rather than committing to a fixed granularity, AdaFuse adjusts fusion behavior on the fly based on the decoding context, with words serving as basic building blocks for alignment. To be specific, we introduce an uncertainty-based criterion to decide whether to apply ensembling at each decoding step. Under confident decoding states, the model continues generation directly. In less certain states, AdaFuse invokes a diversity-aware scaling strategy to explore alternative candidate continuations and inform ensemble decisions. This design establishes a synergistic interaction between adaptive ensembling and test-time scaling, where ensemble decisions guide targeted exploration, and the resulting diversity in turn strengthens ensemble quality. Experiments on open-domain question answering, arithmetic reasoning, and machine translation demonstrate that AdaFuse consistently outperforms strong ensemble baselines, achieving an average relative improvement of 6.88%. The code is available at https://github.com/CCM0111/AdaFuse.

Problem

Research questions and friction points this paper is trying to address.

ensemble decoding

fusion granularity

adaptive ensembling

test-time scaling

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive ensemble decoding

test-time scaling

uncertainty-based fusion