DataSTORM: Deep Research on Large-Scale Databases using Exploratory Data Analysis and Data Storytelling

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing large language model (LLM) agents in conducting deep research that requires multi-step analysis, hypothesis generation, and quantitative reasoning over structured databases. The authors propose a novel LLM agent framework that uniquely integrates principles of exploratory data analysis and data storytelling into a hypothesis-driven research pipeline, enabling iterative validation across both structured databases and web-based information while generating coherent narratives. Evaluated on InsightBench, the system achieves a 19.4% relative improvement in insight recall and a 7.2% increase in summary scores. On a complex real-world dataset constructed from ACLED, it significantly outperforms existing systems—including ChatGPT Deep Research—according to both automated metrics and human evaluations.
📝 Abstract
Deep research with Large Language Model (LLM) agents is emerging as a powerful paradigm for multi-step information discovery, synthesis, and analysis. However, existing approaches primarily focus on unstructured web data, while the challenges of conducting deep research over large-scale structured databases remain relatively underexplored. Unlike web-based research, effective data-centric research requires more than retrieval and summarization and demands iterative hypothesis generation, quantitative reasoning over structured schemas, and convergence toward a coherent analytical narrative. In this paper, we present DataSTORM, an LLM-based agentic system capable of autonomously conducting research across both large-scale structured databases and internet sources. Grounded in principles from Exploratory Data Analysis and Data Storytelling, DataSTORM reframes deep research over structured data as a thesis-driven analytical process: discovering candidate theses from data, validating them through iterative cross-source investigation, and developing them into coherent analytical narratives. We evaluate DataSTORM on InsightBench, where it achieves a new state-of-the-art result with a 19.4% relative improvement in insight-level recall and 7.2% in summary-level score. We further introduce a new dataset built on ACLED, a real-world complex database, and demonstrate that DataSTORM outperforms proprietary systems such as ChatGPT Deep Research across both automated metrics and human evaluations.
Problem

Research questions and friction points this paper is trying to address.

deep research
structured databases
large language models
exploratory data analysis
data storytelling
Innovation

Methods, ideas, or system contributions that make the work stand out.

DataSTORM
LLM agents
Exploratory Data Analysis
Data Storytelling
structured databases
🔎 Similar Papers
No similar papers found.
Shicheng Liu
Shicheng Liu
CS PhD Candidate, Stanford University
Natural Language ProcessingProgramming Langauges & Systems
Y
Yucheng Jiang
Computer Science Department, Stanford University
S
Sajid Farook
Computer Science Department, Stanford University
C
Camila Nicollier Sanchez
Computer Science Department, Stanford University
D
David Fernando Castro Pena
Computer Science Department, Stanford University
Monica S. Lam
Monica S. Lam
Professor of Computer Science, Stanford University
Compilersnatural language processingmachine learningarchitecturecomputer-human interaction