Structure-Aware RAG: Structured Retrieval Augmented Generation from Noisy Data for Conversational Agents

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing retrieval-augmented generation (RAG) approaches in effectively integrating dynamic and domain-specific information when confronted with noisy or irrelevant context. To mitigate this issue, the authors propose a structure-aware RAG framework (SA-RAG) that leverages tables as structured intermediate representations to reduce noise interference while preserving essential knowledge. SA-RAG incorporates a quality-aware table metadata generation mechanism, combining both training-free and trainable table construction strategies, and employs generation verification alongside direct preference optimization to ensure semantic and structural consistency. Experimental results demonstrate that SA-RAG significantly outperforms current RAG baselines on two real-world noisy datasets, and the implementation has been made publicly available.
📝 Abstract
Large Language Models (LLMs) have been widely adopted in conversational applications. However, their reliance on parametric knowledge limits reliability in real-world scenarios that require dynamic or domain-specific information. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external knowledge during generation, but existing text-based and graph-based RAG methods often struggle with noisy or irrelevant contexts. In this work, we propose Structure-aware Retrieval Augmented Generation (SA-RAG), which uses tables as an intermediate structured representation to provide a compact and controllable interface that reduces noise while preserving essential information. We introduce a quality-aware table metadata generation framework that models metadata normalization and effectiveness, improving metadata quality and downstream performance. Furthermore, we explore both training-free and training-based table generation methods. Generation validation and direct preference optimization further improve table quality while maintaining semantic and structural consistency. Experiments on two noisy real-world datasets show that SA-RAG significantly outperforms existing RAG baselines. Our code is publicly available at a public repository.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
noisy data
conversational agents
structured representation
external knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure-aware RAG
structured representation
table-based retrieval
metadata normalization
noise reduction
🔎 Similar Papers
No similar papers found.