🤖 AI Summary
Existing database query systems require explicit user questions and lack proactive exploratory capabilities. This paper proposes DAR, a multi-layer AI agent system enabling end-to-end, fully autonomous database exploration without human intervention—spanning intent inference, joint SQL-and-LLM query generation, iterative validation, and insight report synthesis. DAR introduces the novel “data-agnostic, full-pipeline autonomous” exploration paradigm: all reasoning occurs natively within BigQuery’s generative AI functions, with zero data export to ensure regulatory compliance; it employs metadata-driven initialization, hybrid SQL/LLM query synthesis, and built-in quality control. Evaluated on a real-world asset event dataset, DAR completes the entire analytical workflow in 16 minutes—32× faster than expert analysts (8.5 hours → 16 minutes)—and produces interpretable pattern insights with evidence-backed recommendations.
📝 Abstract
Large language models can already query databases, yet most existing systems remain reactive: they rely on explicit user prompts and do not actively explore data. We introduce DAR (Data Agnostic Researcher), a multi-agent system that performs end-to-end database research without human-initiated queries. DAR orchestrates specialized AI agents across three layers: initialization (intent inference and metadata extraction), execution (SQL and AI-based query synthesis with iterative validation), and synthesis (report generation with built-in quality control). All reasoning is executed directly inside BigQuery using native generative AI functions, eliminating data movement and preserving data governance. On a realistic asset-incident dataset, DAR completes the full analytical task in 16 minutes, compared to 8.5 hours for a professional analyst (approximately 32x times faster), while producing useful pattern-based insights and evidence-grounded recommendations. Although human experts continue to offer deeper contextual interpretation, DAR excels at rapid exploratory analysis. Overall, this work shifts database interaction from query-driven assistance toward autonomous, research-driven exploration within cloud data warehouses.