🤖 AI Summary
Addressing two key challenges in RAG systems—subjective human relevance judgments and dynamically evolving user query capabilities—this paper proposes a human–AI symbiotic RAG framework. Methodologically, it introduces a dual-layer architecture: Level 1 enables real-time human intervention in retrieval results; Level 2 trains personalized retrieval models from interaction logs, augmented by a human-in-the-loop validation mechanism to ensure data quality. Technically, the framework integrates document-structure-aware parsing (layout analysis, OCR, and specialized extraction for formulas, tables, and figures), multi-strategy scalable retrieval, LLM-driven intent summarization and interaction modeling, and a collaborative interactive interface. Evaluated across literature review, geological exploration, and education scenarios, it achieves significant improvements in retrieval relevance (+28.6% NDCG@5) and user satisfaction (+34.2%). The code and models will be open-sourced to advance research on symbiotic intelligence.
📝 Abstract
We present extbf{SymbioticRAG}, a novel framework that fundamentally reimagines Retrieval-Augmented Generation~(RAG) systems by establishing a bidirectional learning relationship between humans and machines. Our approach addresses two critical challenges in current RAG systems: the inherently human-centered nature of relevance determination and users' progression from"unconscious incompetence"in query formulation. SymbioticRAG introduces a two-tier solution where Level 1 enables direct human curation of retrieved content through interactive source document exploration, while Level 2 aims to build personalized retrieval models based on captured user interactions. We implement Level 1 through three key components: (1)~a comprehensive document processing pipeline with specialized models for layout detection, OCR, and extraction of tables, formulas, and figures; (2)~an extensible retriever module supporting multiple retrieval strategies; and (3)~an interactive interface that facilitates both user engagement and interaction data logging. We experiment Level 2 implementation via a retriever strategy incorporated LLM summarized user intention from user interaction logs. To maintain high-quality data preparation, we develop a human-on-the-loop validation interface that improves pipeline output while advancing research in specialized extraction tasks. Evaluation across three scenarios (literature review, geological exploration, and education) demonstrates significant improvements in retrieval relevance and user satisfaction compared to traditional RAG approaches. To facilitate broader research and further advancement of SymbioticRAG Level 2 implementation, we will make our system openly accessible to the research community.