Entropy-Based Decoding for Retrieval-Augmented Large Language Models

๐Ÿ“… 2024-06-25
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the decline in factual accuracy of retrieval-augmented large language models (RAG) caused by noisy external documents and interference from internal parametric knowledge, this paper proposes a training-free, entropy-guided decoding framework. The method introduces two novel mechanisms: (1) document-parallel entropy ensemble decoding, which samples multiple document paths in parallel and performs weighted ensemble of their outputs based on per-path output entropy to suppress low-confidence responses; and (2) cross-layer contrastive decoding, which dynamically identifies and attenuates unreliable, high-entropy internal knowledge activations by measuring KL divergence between hidden state distributions across Transformer layers. Evaluated on multiple open-domain question-answering benchmarks, the approach significantly improves answer factuality and effectively mitigates model โ€œdistractibilityโ€ without requiring fine-tuning or introducing additional parameters.

Technology Category

Application Category

๐Ÿ“ Abstract
Augmenting Large Language Models (LLMs) with retrieved external knowledge has proven effective for improving the factual accuracy of generated responses. Despite their success, retrieval-augmented LLMs still face the distractibility issue, where the generated responses are negatively influenced by noise from both external and internal knowledge sources. In this paper, we introduce a novel, training-free decoding method guided by entropy considerations to mitigate this issue. Our approach utilizes entropy-based document-parallel ensemble decoding to prioritize low-entropy distributions from retrieved documents, thereby enhancing the extraction of relevant information of context. Additionally, it incorporates a contrastive decoding mechanism that contrasts the obtained low-entropy ensemble distribution with the high-entropy distribution derived from the model's internal knowledge across layers, which ensures a greater emphasis on reliable external information. Extensive experiments on open-domain question answering datasets demonstrate the superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

Mitigate distractibility in retrieval-augmented LLMs
Enhance relevant information extraction
Prioritize low-entropy distributions for accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-based decoding method
Document-parallel ensemble decoding
Contrastive decoding mechanism
๐Ÿ”Ž Similar Papers
No similar papers found.