RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution

πŸ“… 2026-01-03
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a self-evolving, human-intervention-free Text-to-SQL framework that forms a closed loop between an SQL generation agent and an evolution agent, leveraging performance feedback to drive continuous optimization. The core innovations include the introduction of an ELO-based mechanism to handle non-transitive performance comparisons, enabling iterative cross-evolution of strategies, and the integration of schema-aware generation with adaptive database analysis, which allows the system to autonomously evolve high-performing reasoning strategies from only 70 lines of initial code. Experimental results demonstrate that the method achieves 73.67% execution accuracy on the BIRD benchmark; notably, the weaker model Claude Haiku improves by 8.9 percentage points after evolution, surpassing stronger non-evolved models and exhibiting significant β€œleapfrog” deployment capability.

Technology Category

Application Category

πŸ“ Abstract
We present RoboPhD, a system where AI agents autonomously conduct research to improve Text-to-SQL performance. RoboPhD implements a closed-loop evolution cycle with two coordinated components: a SQL Generation agent composed of a database analysis script and SQL generation instructions, and an Evolution agent that designs new versions based on performance feedback. Central to the framework is an ELO-based selection mechanism enabling survival-of-the-fittest dynamics while handling non-transitivity in performance. Starting from a naive 70-line baseline, RoboPhD evolves agents through iterative cross-pollination, discovering effective techniques without any external guidance on the Text-to-SQL domain. Our best agent, evolved to 1500 lines over 18 iterations, autonomously discovered strategies such as size-adaptive database analysis that adjusts depth based on schema complexity and SQL generation patterns for column selection, evidence interpretation, and aggregation. Evolution provides the largest gains on cheaper models: while we improve by 2.3 points over a strong Claude Opus 4.5 naive baseline, we show an improvement of 8.9 points over the weaker Claude Haiku model. This enables'skip a tier'deployment: evolved Haiku exceeds naive Sonnet accuracy, and evolved Sonnet exceeds naive Opus, both at lower cost. The full system achieves 73.67% accuracy on the BIRD test set, demonstrating that AI can autonomously build a strong agentic system with only a trivial human-provided starting point.
Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL
autonomous agent evolution
self-improvement
AI research automation
performance optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous agent evolution
Text-to-SQL
ELO-based selection
self-improving AI
agentic system
πŸ”Ž Similar Papers
No similar papers found.