PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses question answering over semi-structured data—comprising both textual and relational information—where large language models (LLMs) suffer from hallucinations due to knowledge lag, and existing retrieval-augmented generation (RAG) methods struggle to model heterogeneous relational structures effectively. We propose a planning-guided LLM agent framework: it first generates interpretable information retrieval plans, then orchestrates graph/table relation parsers and multi-hop retrieval agents to jointly model and dynamically navigate cross-modal textual and relational information. This introduces the novel “planning-based RAG” paradigm, overcoming limitations of unimodal retrieval. Evaluated on cross-domain semi-structured benchmarks, our approach reduces hallucination rates by 32% and improves answer accuracy by 27%, demonstrating enhanced controllability, interpretability, and generalization across diverse data schemas.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown impressive abilities in answering questions across various domains, but they often encounter hallucination issues on questions that require professional and up-to-date knowledge. To address this limitation, retrieval-augmented generation (RAG) techniques have been proposed, which retrieve relevant information from external sources to inform their responses. However, existing RAG methods typically focus on a single type of external data, such as vectorized text database or knowledge graphs, and cannot well handle real-world questions on semi-structured data containing both text and relational information. To bridge this gap, we introduce PASemiQA, a novel approach that jointly leverages text and relational information in semi-structured data to answer questions. PASemiQA first generates a plan to identify relevant text and relational information to answer the question in semi-structured data, and then uses an LLM agent to traverse the semi-structured data and extract necessary information. Our empirical results demonstrate the effectiveness of PASemiQA across different semi-structured datasets from various domains, showcasing its potential to improve the accuracy and reliability of question answering systems on semi-structured data.

Problem

Research questions and friction points this paper is trying to address.

Addresses hallucination issues in LLMs for professional knowledge questions.

Improves question answering on semi-structured data with text and relations.

Proposes PASemiQA to jointly leverage text and relational information.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines text and relational data for QA

Uses plan generation for data identification

LLM agent traverses semi-structured data

🔎 Similar Papers

No similar papers found.