SQL-ASTRA: Alleviating Sparse Feedback in Agentic SQL via Column-Set Matching and Trajectory Aggregation

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the credit assignment challenge in Text-to-SQL tasks, which arises from sparse feedback and reliance solely on final execution rewards. To this end, the authors propose the Agentic SQL framework, featuring a two-level dense reward mechanism. At the trajectory level, it introduces an Aggregate Trajectory Reward (ATR) grounded in Lyapunov stability theory to ensure monotonic policy convergence. At the step level, it employs a Column Set Matching Reward (CSMR) that transforms binary feedback into fine-grained partial correctness signals by computing column match scores via intermediate query execution. Trajectory scores are aggregated using an asymmetric transition matrix, effectively mitigating reward sparsity. Evaluated on the BIRD and Spider 2.0 benchmarks, the proposed method significantly outperforms existing state-of-the-art models, achieving a 5% improvement over binary-reward GRPO on BIRD.

Technology Category

Application Category

📝 Abstract

Agentic Reinforcement Learning (RL) shows promise for complex tasks, but Text-to-SQL remains mostly restricted to single-turn paradigms. A primary bottleneck is the credit assignment problem. In traditional paradigms, rewards are determined solely by the final-turn feedback, which ignores the intermediate process and leads to ambiguous credit evaluation. To address this, we propose Agentic SQL, a framework featuring a universal two-tiered reward mechanism designed to provide effective trajectory-level evaluation and dense step-level signals. First, we introduce Aggregated Trajectory Reward (ATR) to resolve multi-turn credit assignment. Using an asymmetric transition matrix, ATR aggregates process-oriented scores to incentivize continuous improvement. Leveraging Lyapunov stability theory, we prove ATR acts as an energy dissipation operator, guaranteeing a cycle-free policy and monotonic convergence. Second, Column-Set Matching Reward (CSMR) provides immediate step-level rewards to mitigate sparsity. By executing queries at each turn, CSMR converts binary (0/1) feedback into dense [0, 1] signals based on partial correctness. Evaluations on BIRD show a 5% gain over binary-reward GRPO. Notably, our approach outperforms SOTA Arctic-Text2SQL-R1-7B on BIRD and Spider 2.0 using identical models, propelling Text-to-SQL toward a robust multi-turn agent paradigm.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

sparse feedback

credit assignment

multi-turn interaction

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic SQL

Credit Assignment

Trajectory Aggregation