RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution

📅 2026-01-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work proposes a self-evolving, human-intervention-free Text-to-SQL framework that forms a closed loop between an SQL generation agent and an evolution agent, leveraging performance feedback to drive continuous optimization. The core innovations include the introduction of an ELO-based mechanism to handle non-transitive performance comparisons, enabling iterative cross-evolution of strategies, and the integration of schema-aware generation with adaptive database analysis, which allows the system to autonomously evolve high-performing reasoning strategies from only 70 lines of initial code. Experimental results demonstrate that the method achieves 73.67% execution accuracy on the BIRD benchmark; notably, the weaker model Claude Haiku improves by 8.9 percentage points after evolution, surpassing stronger non-evolved models and exhibiting significant “leapfrog” deployment capability.

Technology Category

Application Category

📝 Abstract

We present RoboPhD, a system where AI agents autonomously conduct research to improve Text-to-SQL performance. RoboPhD implements a closed-loop evolution cycle with two coordinated components: a SQL Generation agent composed of a database analysis script and SQL generation instructions, and an Evolution agent that designs new versions based on performance feedback. Central to the framework is an ELO-based selection mechanism enabling survival-of-the-fittest dynamics while handling non-transitivity in performance. Starting from a naive 70-line baseline, RoboPhD evolves agents through iterative cross-pollination, discovering effective techniques without any external guidance on the Text-to-SQL domain. Our best agent, evolved to 1500 lines over 18 iterations, autonomously discovered strategies such as size-adaptive database analysis that adjusts depth based on schema complexity and SQL generation patterns for column selection, evidence interpretation, and aggregation. Evolution provides the largest gains on cheaper models: while we improve by 2.3 points over a strong Claude Opus 4.5 naive baseline, we show an improvement of 8.9 points over the weaker Claude Haiku model. This enables'skip a tier'deployment: evolved Haiku exceeds naive Sonnet accuracy, and evolved Sonnet exceeds naive Opus, both at lower cost. The full system achieves 73.67% accuracy on the BIRD test set, demonstrating that AI can autonomously build a strong agentic system with only a trivial human-provided starting point.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

autonomous agent evolution

self-improvement

AI research automation

performance optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous agent evolution

Text-to-SQL

ELO-based selection