Beyond Text-to-SQL: Autonomous Research-Driven Database Exploration with DAR

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing database query systems require explicit user questions and lack proactive exploratory capabilities. This paper proposes DAR, a multi-layer AI agent system enabling end-to-end, fully autonomous database exploration without human intervention—spanning intent inference, joint SQL-and-LLM query generation, iterative validation, and insight report synthesis. DAR introduces the novel “data-agnostic, full-pipeline autonomous” exploration paradigm: all reasoning occurs natively within BigQuery’s generative AI functions, with zero data export to ensure regulatory compliance; it employs metadata-driven initialization, hybrid SQL/LLM query synthesis, and built-in quality control. Evaluated on a real-world asset event dataset, DAR completes the entire analytical workflow in 16 minutes—32× faster than expert analysts (8.5 hours → 16 minutes)—and produces interpretable pattern insights with evidence-backed recommendations.

Technology Category

Application Category

📝 Abstract
Large language models can already query databases, yet most existing systems remain reactive: they rely on explicit user prompts and do not actively explore data. We introduce DAR (Data Agnostic Researcher), a multi-agent system that performs end-to-end database research without human-initiated queries. DAR orchestrates specialized AI agents across three layers: initialization (intent inference and metadata extraction), execution (SQL and AI-based query synthesis with iterative validation), and synthesis (report generation with built-in quality control). All reasoning is executed directly inside BigQuery using native generative AI functions, eliminating data movement and preserving data governance. On a realistic asset-incident dataset, DAR completes the full analytical task in 16 minutes, compared to 8.5 hours for a professional analyst (approximately 32x times faster), while producing useful pattern-based insights and evidence-grounded recommendations. Although human experts continue to offer deeper contextual interpretation, DAR excels at rapid exploratory analysis. Overall, this work shifts database interaction from query-driven assistance toward autonomous, research-driven exploration within cloud data warehouses.
Problem

Research questions and friction points this paper is trying to address.

Shifts database interaction from query-driven to autonomous research-driven exploration.
Performs end-to-end database research without human-initiated queries.
Enables rapid exploratory analysis within cloud data warehouses.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for autonomous database exploration
Native generative AI functions within BigQuery for governance
Three-layer orchestration: initialization, execution, synthesis
🔎 Similar Papers
No similar papers found.
O
Ostap Vykhopen
V
Viktoria Skorik
M
Maxim Tereschenko
Veronika Solopova
Veronika Solopova
Technische Universität Berlin
Computational linguisticsEthics of AI