SymbioticRAG: Enhancing Document Intelligence Through Human-LLM Symbiotic Collaboration

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing two key challenges in RAG systems—subjective human relevance judgments and dynamically evolving user query capabilities—this paper proposes a human–AI symbiotic RAG framework. Methodologically, it introduces a dual-layer architecture: Level 1 enables real-time human intervention in retrieval results; Level 2 trains personalized retrieval models from interaction logs, augmented by a human-in-the-loop validation mechanism to ensure data quality. Technically, the framework integrates document-structure-aware parsing (layout analysis, OCR, and specialized extraction for formulas, tables, and figures), multi-strategy scalable retrieval, LLM-driven intent summarization and interaction modeling, and a collaborative interactive interface. Evaluated across literature review, geological exploration, and education scenarios, it achieves significant improvements in retrieval relevance (+28.6% NDCG@5) and user satisfaction (+34.2%). The code and models will be open-sourced to advance research on symbiotic intelligence.

Technology Category

Application Category

📝 Abstract
We present extbf{SymbioticRAG}, a novel framework that fundamentally reimagines Retrieval-Augmented Generation~(RAG) systems by establishing a bidirectional learning relationship between humans and machines. Our approach addresses two critical challenges in current RAG systems: the inherently human-centered nature of relevance determination and users' progression from"unconscious incompetence"in query formulation. SymbioticRAG introduces a two-tier solution where Level 1 enables direct human curation of retrieved content through interactive source document exploration, while Level 2 aims to build personalized retrieval models based on captured user interactions. We implement Level 1 through three key components: (1)~a comprehensive document processing pipeline with specialized models for layout detection, OCR, and extraction of tables, formulas, and figures; (2)~an extensible retriever module supporting multiple retrieval strategies; and (3)~an interactive interface that facilitates both user engagement and interaction data logging. We experiment Level 2 implementation via a retriever strategy incorporated LLM summarized user intention from user interaction logs. To maintain high-quality data preparation, we develop a human-on-the-loop validation interface that improves pipeline output while advancing research in specialized extraction tasks. Evaluation across three scenarios (literature review, geological exploration, and education) demonstrates significant improvements in retrieval relevance and user satisfaction compared to traditional RAG approaches. To facilitate broader research and further advancement of SymbioticRAG Level 2 implementation, we will make our system openly accessible to the research community.
Problem

Research questions and friction points this paper is trying to address.

Enhancing RAG systems via human-LLM bidirectional learning
Addressing human-centered relevance determination in retrieval
Improving query formulation from unconscious incompetence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional learning between humans and LLMs
Two-tier solution with interactive document exploration
Human-on-the-loop validation for quality data
🔎 Similar Papers
No similar papers found.
Q
Qiang Sun
The University of Western Australia
Tingting Bi
Tingting Bi
The University of Melbourne & The University of Western Australia
Software ArchitectureSE4AIEmpirical Software EngineeringSoftware Supply Chain
S
Sirui Li
Murdoch University
Eun-Jung Holden
Eun-Jung Holden
Professor, The University of Melbourne
Geodata ScienceAIIndustrial AI ApplicationsData FusionKnowledge Discovery
P
P. Duuring
Geological Survey of Western Australia
K
Kai Niu
The University of Western Australia
W
Wei Liu
The University of Western Australia