AgenticScholar: Agentic Data Management with Pipeline Orchestration for Scholarly Corpora

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing academic data systems struggle to uniformly support diverse query types—such as retrieval, knowledge discovery, and generation—and lack interpretable execution mechanisms. This work proposes an intelligent data management system tailored for academic corpora, which automatically compiles natural language queries into interpretable directed acyclic graph (DAG) execution plans. The system integrates structure-aware knowledge representation, large language model–driven hybrid query planning, and a unified execution framework based on composable operators. By synergistically combining structured knowledge management, agent-based planning, and explainable execution, the approach supports the full spectrum of academic queries and significantly outperforms existing systems in effectiveness, efficiency, and interpretability, thereby establishing a practical foundation for agent-driven academic data management.

Technology Category

Application Category

📝 Abstract
Managing the rapidly growing scholarly corpus poses significant challenges in representation, reasoning, and efficient analysis. An ideal system should unify structured knowledge management, agentic planning, and interpretable execution to support diverse scholarly queries - from retrieval to knowledge discovery and generation - at scale. Unfortunately, existing RAG and document analytics systems fail to achieve all query types simultaneously. To this end, we propose AgenticScholar, an agentic scholarly data management system that integrates a structure-aware knowledge representation layer, an LLM-centric hybrid query planning layer, and a unified execution layer with composable operators. AgenticScholar autonomously translates natural language queries into executable DAG plans, enabling end-to-end reasoning over multi-modal scholarly data. Extensive experiments demonstrate that AgenticScholar significantly outperforms existing systems in effectiveness, efficiency, and interpretability, offering a practical foundation for future research on agentic scholarly data management.
Problem

Research questions and friction points this paper is trying to address.

scholarly corpora
data management
query processing
knowledge representation
retrieval-augmented generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic data management
structure-aware knowledge representation
hybrid query planning
executable DAG plans
scholarly corpora
🔎 Similar Papers
No similar papers found.
H
Hai Lan
School of Electrical Engineering and Computer Science, The University of Queensland, Australia
Tingting Wang
Tingting Wang
Macau University of Science and Technology
Artificial Internet of Things
Z
Zhifeng Bao
School of Electrical Engineering and Computer Science, The University of Queensland, Australia
Guoliang Li
Guoliang Li
Professor, Tsinghua University
DatabaseBig DataCrowdsourcingData Cleaning & Integration
D
Daomin Ji
School of Electrical Engineering and Computer Science, The University of Queensland, Australia
G
Ge Lee
School of Electrical Engineering and Computer Science, The University of Queensland, Australia
Feng Luo
Feng Luo
Professor, School of Computing, Clemson University
BioinformaticsDeep LearningBig Data Analytics
Zi Huang
Zi Huang
PhD Candidate
Deep Learning
H
Hailang Qiu
School of Computer Science, Wuhan University, China
Gang Hua
Gang Hua
Director of Applied Science, AI, Amazon.com, Inc., IEEE & IAPR Fellow
Computer VisionMachine LearningArtificial IntelligenceRoboticsMultimedia