PSM-SQL: Progressive Schema Learning with Multi-granularity Semantics for Text-to-SQL

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In text-to-SQL translation, database schema redundancy hinders semantic learning, a deep semantic gap persists between natural language and SQL, and existing schema linking methods are limited to single-pass, table-level matching—ignoring multi-granularity interactions and cyclic dependencies. To address these challenges, we propose a progressive multi-granularity schema learning framework. Our approach introduces a novel chained iterative pruning strategy: (1) column-level embedding optimization via triplet loss; (2) joint classification and similarity modeling at the table level; and (3) database-level global reasoning through fine-tuning large language models. By enabling multi-granularity collaborative filtering and iterative refinement, the framework significantly improves schema-semantic alignment accuracy. Evaluated on mainstream text-to-SQL benchmarks, our method achieves 1–3 percentage points higher execution accuracy over state-of-the-art approaches, demonstrating both effectiveness and strong generalization across diverse schemas.

Technology Category

Application Category

📝 Abstract
It is challenging to convert natural language (NL) questions into executable structured query language (SQL) queries for text-to-SQL tasks due to the vast number of database schemas with redundancy, which interferes with semantic learning, and the domain shift between NL and SQL. Existing works for schema linking focus on the table level and perform it once, ignoring the multi-granularity semantics and chainable cyclicity of schemas. In this paper, we propose a progressive schema linking with multi-granularity semantics (PSM-SQL) framework to reduce the redundant database schemas for text-to-SQL. Using the multi-granularity schema linking (MSL) module, PSM-SQL learns the schema semantics at the column, table, and database levels. More specifically, a triplet loss is used at the column level to learn embeddings, while fine-tuning LLMs is employed at the database level for schema reasoning. MSL employs classifier and similarity scores to model schema interactions for schema linking at the table level. In particular, PSM-SQL adopts a chain loop strategy to reduce the task difficulty of schema linking by continuously reducing the number of redundant schemas. Experiments conducted on text-to-SQL datasets show that the proposed PSM-SQL is 1-3 percentage points higher than the existing methods.
Problem

Research questions and friction points this paper is trying to address.

Convert natural language to SQL queries
Reduce redundant database schemas
Improve multi-granularity schema learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-granularity schema linking
Triplet loss embeddings
Chain loop strategy
🔎 Similar Papers
Zhuopan Yang
Zhuopan Yang
广东工业大学在读硕士生
Multi-modal Learning、Text-to-SQL、Zero-shot Learning
Y
Yuanzhen Xie
Platform and Content Group, Tencent
R
Ruichao Zhong
Platform and Content Group, Tencent
Yunzhi Tan
Yunzhi Tan
Tencent
Recommendation SystemMachine Learning
Z
Zhenguo Yang
Guangdong University of Technology
M
Mochi Gao
Platform and Content Group, Tencent
B
Bo Hu
Platform and Content Group, Tencent
Z
Zang Li
Platform and Content Group, Tencent