Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) often exhibit hallucination and low contextual faithfulness in long-form question answering. To address this, we propose GenDiE, a sentence-level self-evolution framework. GenDiE introduces a novel fine-grained, sentence-level optimization paradigm that decomposes model responses into independent units and establishes a closed-loop “generate–discriminate–evolve” pipeline. It jointly trains generation and discrimination modules via instruction tuning and contrastive learning, enabling the model to autonomously construct aligned training data. Furthermore, it integrates confidence-guided beam search and fine-grained faithfulness scoring. Evaluated on ASQA and ConFiQA benchmarks, GenDiE significantly improves response faithfulness and answer correctness while demonstrating strong cross-domain generalization. This work establishes a new paradigm for trustworthy retrieval-augmented generation systems.

Technology Category

Application Category

📝 Abstract

Improving context faithfulness in large language models is essential for developing trustworthy retrieval augmented generation systems and mitigating hallucinations, especially in long-form question answering (LFQA) tasks or scenarios involving knowledge conflicts. Existing methods either intervene LLMs only at inference without addressing their inherent limitations or overlook the potential for self-improvement. In this paper, we introduce GenDiE (Generate, Discriminate, Evolve), a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization. GenDiE combines both generative and discriminative training, equipping LLMs with self-generation and self-scoring capabilities to facilitate iterative self-evolution. This supports both data construction for model alignment and score-guided search during inference. Furthermore, by treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches that optimize at the holistic answer level, which may miss unfaithful details. Experiments on ASQA (in-domain LFQA) and ConFiQA (out-of-domain counterfactual QA) datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness, and exhibits robust performance for domain adaptation.

Problem

Research questions and friction points this paper is trying to address.

Enhance context faithfulness in large language models.

Mitigate hallucinations in long-form question answering tasks.

Optimize sentence-level details for improved model alignment.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving framework for context faithfulness

Fine-grained sentence-level optimization approach

Generative and discriminative training combined

🔎 Similar Papers

Disentangling Latent Shifts of In-Context Learning Through Self-Training