🤖 AI Summary
Non-expert users struggle to leverage open data warehouses for evidence-driven decision-making. Method: This paper proposes a large language model (LLM)-based multi-agent framework for end-to-end data analysis, decomposing the analytical pipeline into specialized agents—intent clarification, data discovery, statistical analysis, and report generation—enabling context-focused execution and cross-stage validation to mitigate LLM limitations in long-horizon reasoning, domain interference, and undetected error propagation. Contribution/Results: Guided by five novel design principles—including agent specialization decoupled from model capability, distinction between generic and conditional agents, and failure-mode–targeted mitigation—the framework achieves 84–97.5% agent win rates across five LLMs and 50 real-world queries. It significantly reduces error propagation, and its architectural advantages remain stable with increasing task complexity, demonstrating core value in analytical workflow orchestration.
📝 Abstract
Open data repositories hold potential for evidence-based decision-making, yet are inaccessible to non-experts lacking expertise in dataset discovery, schema mapping, and statistical analysis. Large language models show promise for individual tasks, but end-to-end analytical workflows expose fundamental limitations: attention dilutes across growing contexts, specialized reasoning patterns interfere, and errors propagate undetected. We present PublicAgent, a multi-agent framework that addresses these limitations through decomposition into specialized agents for intent clarification, dataset discovery, analysis, and reporting. This architecture maintains focused attention within agent contexts and enables validation at each stage. Evaluation across five models and 50 queries derives five design principles for multi-agent LLM systems. First, specialization provides value independent of model strength--even the strongest model shows 97.5% agent win rates, with benefits orthogonal to model scale. Second, agents divide into universal (discovery, analysis) and conditional (report, intent) categories. Universal agents show consistent effectiveness (std dev 12.4%) while conditional agents vary by model (std dev 20.5%). Third, agents mitigate distinct failure modes--removing discovery or analysis causes catastrophic failures (243-280 instances), while removing report or intent causes quality degradation. Fourth, architectural benefits persist across task complexity with stable win rates (86-92% analysis, 84-94% discovery), indicating workflow management value rather than reasoning enhancement. Fifth, wide variance in agent effectiveness across models (42-96% for analysis) requires model-aware architecture design. These principles guide when and why specialization is necessary for complex analytical workflows while enabling broader access to public data through natural language interfaces.