DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world data analysis remains heavily reliant on manual effort, resulting in low efficiency and poor scalability. Existing automated approaches exhibit critical limitations in open-domain data acquisition, structured transformation, and analytical reasoning. This paper introduces the first unified multi-agent framework that deeply integrates data retrieval, tabular parsing, natural language understanding, and logical reasoning to enable end-to-end, natural language–driven open-domain analysis. The system decomposes and executes cross-modal tasks via collaborative specialized agents. Evaluated on our newly constructed benchmark DRAMA-Bench, it achieves 86.5% task accuracy at a per-query cost of only $0.05—outperforming the best baseline by 6.9× in accuracy and reducing cost to less than 1/6. Our core contribution is the first lightweight, efficient automation paradigm spanning the full pipeline: data acquisition → structuring → reasoning.

Technology Category

Application Category

📝 Abstract
Manually conducting real-world data analyses is labor-intensive and inefficient. Despite numerous attempts to automate data science workflows, none of the existing paradigms or systems fully demonstrate all three key capabilities required to support them effectively: (1) open-domain data collection, (2) structured data transformation, and (3) analytic reasoning. To overcome these limitations, we propose DRAMA, an end-to-end paradigm that answers users' analytic queries in natural language on large-scale open-domain data. DRAMA unifies data collection, transformation, and analysis as a single pipeline. To quantitatively evaluate system performance on tasks representative of DRAMA, we construct a benchmark, DRAMA-Bench, consisting of two categories of tasks: claim verification and question answering, each comprising 100 instances. These tasks are derived from real-world applications that have gained significant public attention and require the retrieval and analysis of open-domain data. We develop DRAMA-Bot, a multi-agent system designed following DRAMA. It comprises a data retriever that collects and transforms data by coordinating the execution of sub-agents, and a data analyzer that performs structured reasoning over the retrieved data. We evaluate DRAMA-Bot on DRAMA-Bench together with five state-of-the-art baseline agents. DRAMA-Bot achieves 86.5% task accuracy at a cost of $0.05, outperforming all baselines with up to 6.9 times the accuracy and less than 1/6 of the cost. DRAMA is publicly available at https://github.com/uiuc-kang-lab/drama.
Problem

Research questions and friction points this paper is trying to address.

Automating open-domain data collection and analysis workflows
Unifying data retrieval, transformation, and analytic reasoning
Solving natural language analytic queries on large-scale data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies data collection, transformation, and analysis pipeline
Uses multi-agent system for automated data retrieval
Performs structured reasoning over open-domain data
🔎 Similar Papers
No similar papers found.
Chuxuan Hu
Chuxuan Hu
University of Illinois at Urbana Champaign
M
Maxwell Yang
University of Illinois Urbana-Champaign, USA
J
James Weiland
University of Illinois Urbana-Champaign, USA
Y
Yeji Lim
University of Illinois Urbana-Champaign, USA
S
Suhas Palawala
University of Illinois Urbana-Champaign, USA
Daniel Kang
Daniel Kang
UIUC
Computer Science