A Survey on Retrieval And Structuring Augmented Generation with Large Language Models

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address core limitations of large language models (LLMs)—including hallucination, knowledge obsolescence, and poor domain adaptability—this work systematically advances the Retrieval-Augmented Structured (RAS) generation paradigm. We propose a multi-granularity knowledge acquisition mechanism integrating sparse, dense, and hybrid retrieval, coupled with text structuralization, taxonomy construction, knowledge embedding, and prompt-driven reasoning to enable efficient external knowledge retrieval, semantic alignment, and controllable integration. Crucially, we deeply embed structured modeling into the augmentation pipeline, enhancing factual accuracy, temporal freshness, and domain-specific competence of generated outputs. Our contributions include: (1) a unified methodological framework for RAS generation; (2) principled pathways toward multimodal, cross-lingual, and interactive augmented generation; and (3) empirically validated improvements in reliability and specialization across diverse domains. This work establishes foundational design principles and future research directions for next-generation RAS systems.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized natural language processing with their remarkable capabilities in text generation and reasoning. However, these models face critical challenges when deployed in real-world applications, including hallucination generation, outdated knowledge, and limited domain expertise. Retrieval And Structuring (RAS) Augmented Generation addresses these limitations by integrating dynamic information retrieval with structured knowledge representations. This survey (1) examines retrieval mechanisms including sparse, dense, and hybrid approaches for accessing external knowledge; (2) explore text structuring techniques such as taxonomy construction, hierarchical classification, and information extraction that transform unstructured text into organized representations; and (3) investigate how these structured representations integrate with LLMs through prompt-based methods, reasoning frameworks, and knowledge embedding techniques. It also identifies technical challenges in retrieval efficiency, structure quality, and knowledge integration, while highlighting research opportunities in multimodal retrieval, cross-lingual structures, and interactive systems. This comprehensive overview provides researchers and practitioners with insights into RAS methods, applications, and future directions.

Problem

Research questions and friction points this paper is trying to address.

Addressing LLM hallucination and outdated knowledge issues

Integrating dynamic retrieval with structured knowledge representations

Enhancing domain expertise through retrieval and structuring techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating dynamic retrieval with structured knowledge representations

Using sparse dense hybrid retrieval for external knowledge

Applying text structuring techniques for organized representations

🔎 Similar Papers

No similar papers found.