AgenticScholar: Agentic Data Management with Pipeline Orchestration for Scholarly Corpora

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing academic data systems struggle to uniformly support diverse query types—such as retrieval, knowledge discovery, and generation—and lack interpretable execution mechanisms. This work proposes an intelligent data management system tailored for academic corpora, which automatically compiles natural language queries into interpretable directed acyclic graph (DAG) execution plans. The system integrates structure-aware knowledge representation, large language model–driven hybrid query planning, and a unified execution framework based on composable operators. By synergistically combining structured knowledge management, agent-based planning, and explainable execution, the approach supports the full spectrum of academic queries and significantly outperforms existing systems in effectiveness, efficiency, and interpretability, thereby establishing a practical foundation for agent-driven academic data management.

Technology Category

Application Category

📝 Abstract

Managing the rapidly growing scholarly corpus poses significant challenges in representation, reasoning, and efficient analysis. An ideal system should unify structured knowledge management, agentic planning, and interpretable execution to support diverse scholarly queries - from retrieval to knowledge discovery and generation - at scale. Unfortunately, existing RAG and document analytics systems fail to achieve all query types simultaneously. To this end, we propose AgenticScholar, an agentic scholarly data management system that integrates a structure-aware knowledge representation layer, an LLM-centric hybrid query planning layer, and a unified execution layer with composable operators. AgenticScholar autonomously translates natural language queries into executable DAG plans, enabling end-to-end reasoning over multi-modal scholarly data. Extensive experiments demonstrate that AgenticScholar significantly outperforms existing systems in effectiveness, efficiency, and interpretability, offering a practical foundation for future research on agentic scholarly data management.

Problem

Research questions and friction points this paper is trying to address.

scholarly corpora

data management

query processing

knowledge representation

retrieval-augmented generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic data management

structure-aware knowledge representation

hybrid query planning