🤖 AI Summary
This work addresses the challenge that natural language instructions from users often lack implicit knowledge—such as data schemas and domain-specific conventions—leading large language models to generate erroneous and hard-to-verify SQL queries. To bridge this gap, the authors propose Cerebra, an interactive NL-to-SQL tool that explicitly aligns implicit knowledge between users and the model during query generation. Cerebra retrieves historical SQL scripts to extract relevant implicit knowledge and presents it through an interactive tree-based visualization, enabling users to iteratively refine their queries. Experimental results demonstrate that Cerebra significantly improves both the accuracy and verifiability of generated SQL, effectively supporting customized query formulation and enhancing the overall user experience.
📝 Abstract
LLM-driven tools have significantly lowered barriers to writing SQL queries. However, user instructions are often underspecified, assuming the model understands implicit knowledge, such as dataset schemas, domain conventions, and task-specific requirements, that isn't explicitly provided. This results in frequently erroneous scripts that require users to repeatedly clarify their intent. Additionally, users struggle to validate generated scripts because they cannot verify whether the model correctly applied implicit knowledge. We present Cerebra, an interactive NL-to-SQL tool that aligns implicit knowledge between users and LLMs during SQL authoring. Cerebra automatically retrieves implicit knowledge from historical SQL scripts based on user instructions, presents this knowledge in an interactive tree view for code review, and supports iterative refinement to improve generated scripts. To evaluate the effectiveness and usability of Cerebra, we conducted a user study with 16 participants, demonstrating its improved support for customized SQL authoring. The source code of Cerebra is available at https://github.com/zjuidg/CHI26-Cerebra.