QueryGym: Step-by-Step Interaction with Relational Databases

πŸ“… 2025-09-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing query planning frameworks often bind agents to specific SQL dialects or rely on implicit reasoning, hindering engine-agnostic evaluation and transparent planning. To address this, we propose an interactive query planning environment grounded in relational algebra, enabling LLM-based agents to explore databases through explicit, traceable sequences of relational operationsβ€”e.g., selection, projection, and join. Methodologically, we integrate schema information, intermediate results, and execution feedback via a Gymnasium interface for fine-grained control; we further design an engine-agnostic evaluation framework that mandates explicit execution of relational algebra operators, thereby enhancing reasoning transparency, enabling precise error attribution and repair, and facilitating reinforcement learning research. Experiments demonstrate substantial improvements in interpretability, controllability, and research extensibility over prior approaches.

Technology Category

Application Category

πŸ“ Abstract
We introduce QueryGym, an interactive environment for building, testing, and evaluating LLM-based query planning agents. Existing frameworks often tie agents to specific query language dialects or obscure their reasoning; QueryGym instead requires agents to construct explicit sequences of relational algebra operations, ensuring engine-agnostic evaluation and transparent step-by-step planning. The environment is implemented as a Gymnasium interface that supplies observations -- including schema details, intermediate results, and execution feedback -- and receives actions that represent database exploration (e.g., previewing tables, sampling column values, retrieving unique values) as well as relational algebra operations (e.g., filter, project, join). We detail the motivation and the design of the environment. In the demo, we showcase the utility of the environment by contrasting it with contemporary LLMs that query databases. QueryGym serves as a practical testbed for research in error remediation, transparency, and reinforcement learning for query generation. For the associated demo, see https://ibm.biz/QueryGym.
Problem

Research questions and friction points this paper is trying to address.

Developing interactive environment for LLM-based query planning
Enabling transparent step-by-step relational algebra operations
Providing testbed for error remediation and reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive environment for LLM-based query planning agents
Explicit relational algebra sequences for transparent planning
Gymnasium interface with observations and database exploration actions
πŸ”Ž Similar Papers