🤖 AI Summary
Large language models (LLMs) face a fundamental trade-off between faithfulness and expressiveness when integrating external knowledge: over-reliance on retrieved content often yields verbose, rigid outputs, whereas dependence on parametric knowledge risks factual inaccuracy. To address this, we propose Collaborative Decoding (CoDe), a plug-and-play framework that dynamically fuses internal and external knowledge via joint optimization of distributional divergence and model confidence. CoDe introduces a novel knowledge-aware re-ranking mechanism that adaptively balances parametric and retrieval-based knowledge during decoding—without requiring fine-tuning. Extensive experiments across multiple state-of-the-art LLMs and diverse evaluation dimensions—including Faithfulness, BLEU, and BERTScore—demonstrate that CoDe significantly improves answer faithfulness while preserving linguistic naturalness and fluency. The results validate CoDe’s effectiveness, model-agnostic generalizability, and deployment efficiency.
📝 Abstract
Grounding responses in external knowledge represents an effective strategy for mitigating hallucinations in Large Language Models (LLMs). However, current LLMs struggle to seamlessly integrate knowledge while simultaneously maintaining faithfulness (or fidelity) and expressiveness, capabilities that humans naturally possess. This limitation results in outputs that either lack support from external knowledge, thereby compromising faithfulness, or appear overly verbose and unnatural, thus sacrificing expressiveness. In this work, to break the trade-off between faithfulness and expressiveness, we propose Collaborative Decoding (CoDe), a novel approach that dynamically integrates output probabilities generated with and without external knowledge. This integration is guided by distribution divergence and model confidence, enabling the selective activation of relevant and reliable expressions from the model's internal parameters. Furthermore, we introduce a knowledge-aware reranking mechanism that prevents over-reliance on prior parametric knowledge while ensuring proper utilization of provided external information. Through comprehensive experiments, our plug-and-play CoDe framework demonstrates superior performance in enhancing faithfulness without compromising expressiveness across diverse LLMs and evaluation metrics, validating both its effectiveness and generalizability.