Disco-RAG: Discourse-Aware Retrieval-Augmented Generation

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a key limitation of existing retrieval-augmented generation (RAG) approaches, which process retrieved passages in a flat manner and thus struggle to model discourse structure, hindering effective cross-document evidence integration. To overcome this, the paper proposes the first RAG framework that explicitly incorporates discourse structure by constructing intra-paragraph discourse trees and inter-paragraph rhetorical graphs, thereby forming a structured generation blueprint. This blueprint guides large language models to synthesize multi-source information without requiring fine-tuning. By jointly modeling local hierarchical structure and cross-paragraph coherence, the method significantly enhances knowledge synthesis capabilities, achieving state-of-the-art performance on both question answering and long-document summarization benchmarks. These results underscore the critical role of discourse structure in advancing RAG systems.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) has emerged as an important means of enhancing the performance of large language models (LLMs) in knowledge-intensive tasks. However, most existing RAG strategies treat retrieved passages in a flat and unstructured way, which prevents the model from capturing structural cues and constrains its ability to synthesize knowledge from dispersed evidence across documents. To overcome these limitations, we propose Disco-RAG, a discourse-aware framework that explicitly injects discourse signals into the generation process. Our method constructs intra-chunk discourse trees to capture local hierarchies and builds inter-chunk rhetorical graphs to model cross-passage coherence. These structures are jointly integrated into a planning blueprint that conditions the generation. Experiments on question answering and long-document summarization benchmarks show the efficacy of our approach. Disco-RAG achieves state-of-the-art results on the benchmarks without fine-tuning. These findings underscore the important role of discourse structure in advancing RAG systems.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
discourse structure
knowledge synthesis
document coherence
structured retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discourse-aware
Retrieval-Augmented Generation
Discourse Tree
Rhetorical Graph
Knowledge Synthesis
🔎 Similar Papers
No similar papers found.
Dongqi Liu
Dongqi Liu
Saarland University
Computational LinguisticsNatural Language Processing
H
Hang Ding
Shanghai Jiaotong University
Q
Qiming Feng
Fudan University
Jian Li
Jian Li
Tencent Youtu Lab
CV,MLLM
Xurong Xie
Xurong Xie
Institute of Software, Chinese Academy of Sciences
Speech and Language ProcessingMachine LearningHuman-Computer InteractionAI for Health
Z
Zhucun Xue
Zhejiang University
C
Chengjie Wang
Tencent YouTu Lab
J
Jiang-She Zhang
Tencent YouTu Lab
Y
Yabiao Wang
Tencent YouTu Lab