RubikSQL: Lifelong Learning Agentic Knowledge Base as an Industrial NL2SQL System

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enterprise-level NL2SQL systems face critical challenges in interpreting implicit user intent, adapting to domain-specific terminology, and enabling continuous knowledge evolution. Method: This paper pioneers modeling NL2SQL as a continual lifelong learning task and proposes a multi-agent collaborative framework. It introduces automated, dynamic construction and maintenance of a knowledge base via database profiling and SQL profiling—integrating structured information extraction, rule mining, and chain-of-thought–enhanced reasoning—while decoupling knowledge evolution from SQL generation. Contribution/Results: To advance industrial evaluation, we release RubikBench, the first benchmark targeting complex enterprise queries. Our approach achieves state-of-the-art performance on KaggleDBQA and BIRD Mini-Dev, with substantial improvements in accuracy for multi-hop, nested, and domain-specific queries, as well as enhanced system scalability and maintainability.

Technology Category

Application Category

📝 Abstract
We present RubikSQL, a novel NL2SQL system designed to address key challenges in real-world enterprise-level NL2SQL, such as implicit intents and domain-specific terminology. RubikSQL frames NL2SQL as a lifelong learning task, demanding both Knowledge Base (KB) maintenance and SQL generation. RubikSQL systematically builds and refines its KB through techniques including database profiling, structured information extraction, agentic rule mining, and Chain-of-Thought (CoT)-enhanced SQL profiling. RubikSQL then employs a multi-agent workflow to leverage this curated KB, generating accurate SQLs. RubikSQL achieves SOTA performance on both the KaggleDBQA and BIRD Mini-Dev datasets. Finally, we release the RubikBench benchmark, a new benchmark specifically designed to capture vital traits of industrial NL2SQL scenarios, providing a valuable resource for future research.
Problem

Research questions and friction points this paper is trying to address.

Addresses implicit intents in enterprise NL2SQL systems
Handles domain-specific terminology for industrial databases
Implements lifelong learning for knowledge base maintenance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lifelong learning Knowledge Base maintenance
Agentic rule mining for SQL generation
Chain-of-Thought enhanced SQL profiling
🔎 Similar Papers
No similar papers found.
Zui Chen
Zui Chen
Huawei Company
NLPDatabasesLLMs
H
Han Li
Cornell University, New York, USA
Xinhao Zhang
Xinhao Zhang
PHD student, Portland State University
Data MiningReinforcement Learning
X
Xiaoyu Chen
Huawei Company, Shanghai, China
C
Chunyin Dong
Huawei Company, Shanghai, China
Y
Yifeng Wang
Huawei Company, Shanghai, China
Xin Cai
Xin Cai
MMLab, CUHK
Computer Vision
Su Zhang
Su Zhang
Huawei Company, Shanghai, China
Ziqi Li
Ziqi Li
Assistant Professor, Florida State University
Spatial Data ScienceGIScienceSpatial Statistics
C
Chi Ding
Huawei Company, Shanghai, China
J
Jinxu Li
Huawei Company, Shanghai, China
S
Shuai Wang
Huawei Company, Shanghai, China
D
Dousheng Zhao
Huawei Company, Shanghai, China
S
Sanhai Gao
Huawei Company, Shanghai, China
G
Guangyi Liu
Huawei Company, Shanghai, China