FlexSQL: Flexible Exploration and Execution Make Better Text-to-SQL Agents

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

138K/year

🤖 AI Summary

This work addresses the challenges of Text-to-SQL in large analytical databases, where complex schemas, ambiguous parsing, and data-dependent decisions hinder performance, and conventional fixed-pipeline systems struggle to recover from early errors. To overcome these limitations, the authors propose FlexSQL, an agent-based framework that integrates dynamic schema retrieval and a flexible execution mechanism, enabling exploration of schema structures, data validation, and backtracking for correction at any reasoning stage. FlexSQL supports dual-level repair—both at the code and planning levels—and combines SQL/Python hybrid generation with multi-interpretation execution plans. Evaluated on the Spider2-Snow benchmark, this approach achieves 65.4% accuracy, outperforming stronger open-source baselines and delivering over a 10% performance gain when integrated into general-purpose programming agents.

📝 Abstract

Text-to-SQL over large analytical databases requires navigating complex schemas, resolving ambiguous queries, and grounding decisions in actual data. Most current systems follow a fixed pipeline where schema elements are retrieved once upfront and the database is only revisited for post-hoc repair, limiting recovery from early mistakes. We present FlexSQL, a text-to-SQL agent whose core design principle is flexible database interaction: the agent can explore schema structure, inspect data values, and run verification queries at any point during reasoning. FlexSQL generates diverse execution plans to cover multiple query interpretations, implements each plan in either SQL or Python depending on the task, and uses a two-tiered repair mechanism that can backtrack from code-level errors to plan-level revisions. On Spider2-Snow, using gpt-oss-120b, FlexSQL achieves a 65.4\% score, outperforming strong open-source baselines that use stronger, larger models such as gpt-o3 and DeepSeek-R1. When integrated into a general-purpose coding agent (as skills in Claude Code), our approach yields over 10\% relative improvement on Spider2-Snow. Further analysis shows that flexible exploration and flexible execution jointly contribute to the effectiveness of our approach, highlighting flexibility as a key design principle. Our code is available at: https://github.com/StringNLPLAB/FlexSQL

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

complex schema

ambiguous queries

database interaction

error recovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

flexible exploration

flexible execution

text-to-SQL agent