Post-training an LLM for RAG? Train on Self-Generated Demonstrations

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from outdated or insufficient knowledge in knowledge-intensive question answering (QA), leading to degraded retrieval-augmented generation (RAG) performance and frequent hallucinations. Method: This paper proposes a RAG post-training paradigm based on model self-generated retrieval-augmented demonstrations. It employs self-supervised generation of high-quality, in-distribution retrieval–response pairs, circumventing distribution shift and retrieval–response misalignment issues inherent in conventional RAG fine-tuning with external data. Additionally, it incorporates a controllable abstention mechanism enabling the model to autonomously detect uncertainty and decline answering unreliable queries. Contribution/Results: On knowledge-intensive QA benchmarks, our method significantly outperforms baselines such as RA-IT, while preserving the model’s original capabilities in non-RAG settings. It effectively mitigates hallucinations and enhances answer reliability and trustworthiness.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) often struggle with knowledge intensive NLP tasks, such as answering"Who won the latest World Cup?"because the knowledge they learn during training may be insufficient or outdated. Conditioning generation on retrieved documents -- a technique known as retrieval augmented generation (RAG) -- mitigates these shortcomings by allowing the model to leverage in-context information. Practitioners can improve LLM RAG performance by fine-tuning on retrieval-augmented instructions, but must beware that this can cause undesirable model behaviors like hallucinations. We attribute this degradation to the fact that the training data is likely to be out-of-distribution for the model and may suffer from quality issues, such as misalignment between retrievals and target responses (since retrievals are frequently added post-hoc). We propose a recipe for training RAG-enabled LLMs using self-generated demonstrations, thereby avoiding training on out-of-distribution text and integrating retrievals into the LLM responses. We evaluate our method on knowledge intensive question answering (QA) tasks and show that our method teaches LLMs to properly handle in-context retrievals and abstain from questions it will likely get wrong. Compared to conventional RA-IT methods, our method prevents model degradation in non-RAG settings while exhibiting superior QA performance.
Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with knowledge-intensive NLP tasks.
Fine-tuning on retrieval-augmented instructions can cause hallucinations.
Proposed method trains RAG-enabled LLMs using self-generated demonstrations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-generated demonstrations enhance RAG
Mitigates model degradation with quality training
Improves LLM handling of in-context retrievals
🔎 Similar Papers
No similar papers found.