Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often exhibit hallucination and low contextual faithfulness in long-form question answering. To address this, we propose GenDiE, a sentence-level self-evolution framework. GenDiE introduces a novel fine-grained, sentence-level optimization paradigm that decomposes model responses into independent units and establishes a closed-loop “generate–discriminate–evolve” pipeline. It jointly trains generation and discrimination modules via instruction tuning and contrastive learning, enabling the model to autonomously construct aligned training data. Furthermore, it integrates confidence-guided beam search and fine-grained faithfulness scoring. Evaluated on ASQA and ConFiQA benchmarks, GenDiE significantly improves response faithfulness and answer correctness while demonstrating strong cross-domain generalization. This work establishes a new paradigm for trustworthy retrieval-augmented generation systems.

Technology Category

Application Category

📝 Abstract
Improving context faithfulness in large language models is essential for developing trustworthy retrieval augmented generation systems and mitigating hallucinations, especially in long-form question answering (LFQA) tasks or scenarios involving knowledge conflicts. Existing methods either intervene LLMs only at inference without addressing their inherent limitations or overlook the potential for self-improvement. In this paper, we introduce GenDiE (Generate, Discriminate, Evolve), a novel self-evolving framework that enhances context faithfulness through fine-grained sentence-level optimization. GenDiE combines both generative and discriminative training, equipping LLMs with self-generation and self-scoring capabilities to facilitate iterative self-evolution. This supports both data construction for model alignment and score-guided search during inference. Furthermore, by treating each sentence in a response as an independent optimization unit, GenDiE effectively addresses the limitations of previous approaches that optimize at the holistic answer level, which may miss unfaithful details. Experiments on ASQA (in-domain LFQA) and ConFiQA (out-of-domain counterfactual QA) datasets demonstrate that GenDiE surpasses various baselines in both faithfulness and correctness, and exhibits robust performance for domain adaptation.
Problem

Research questions and friction points this paper is trying to address.

Enhance context faithfulness in large language models.
Mitigate hallucinations in long-form question answering tasks.
Optimize sentence-level details for improved model alignment.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving framework for context faithfulness
Fine-grained sentence-level optimization approach
Generative and discriminative training combined
🔎 Similar Papers
No similar papers found.
K
Kun Li
The Chinese University of Hong Kong, Hong Kong SAR, China
Tianhua Zhang
Tianhua Zhang
The Chinese University of Hong Kong
natural language processinglarge language models
Y
Yunxiang Li
The Chinese University of Hong Kong, Hong Kong SAR, China
Hongyin Luo
Hongyin Luo
MIT CSAIL
Artificial IntelligenceMachine LearningNatural Language Processing
A
Abdalla Moustafa
The Chinese University of Hong Kong, Hong Kong SAR, China
Xixin Wu
Xixin Wu
The Chinese University of Hong Kong
James Glass
James Glass
MIT Computer Science and Artificial Intelligence Laboratory
Speech and Language Processing
H
Helen M. Meng
The Chinese University of Hong Kong, Hong Kong SAR, China