🤖 AI Summary
Existing unstructured data analysis systems rely heavily on expert-written code and manual orchestration of complex pipelines, resulting in high operational costs and low efficiency. This paper proposes an agent-based analytical system for heterogeneous data that enables users to pose queries in natural language and automatically performs end-to-end analysis across both structured and unstructured data to generate actionable insights. Our approach introduces three key innovations: (1) a feedback-driven semantic planning mechanism that iteratively and efficiently translates natural language queries into executable analytical plans; (2) a multi-agent collaborative architecture integrating data profiling, semantic cross-validation, intelligent memory, and unified execution of relational and semantic operators; and (3) a semantic optimization model enhancing generalization and robustness. Evaluated on three benchmark datasets, our system consistently outperforms state-of-the-art methods, achieving substantial improvements in accuracy and stability—particularly on complex, multi-step analytical tasks.
📝 Abstract
Existing unstructured data analytics systems rely on experts to write code and manage complex analysis workflows, making them both expensive and time-consuming. To address these challenges, we introduce AgenticData, an innovative agentic data analytics system that allows users to simply pose natural language (NL) questions while autonomously analyzing data sources across multiple domains, including both unstructured and structured data. First, AgenticData employs a feedback-driven planning technique that automatically converts an NL query into a semantic plan composed of relational and semantic operators. We propose a multi-agent collaboration strategy by utilizing a data profiling agent for discovering relevant data, a semantic cross-validation agent for iterative optimization based on feedback, and a smart memory agent for maintaining short-term context and long-term knowledge. Second, we propose a semantic optimization model to refine and execute semantic plans effectively. Our system, AgenticData, has been tested using three benchmarks. Experimental results showed that AgenticData achieved superior accuracy on both easy and difficult tasks, significantly outperforming state-of-the-art methods.