MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address the dual challenges of privacy leakage induced by large language models (LLMs) and suboptimal performance of small language models (SLMs) in text-to-SQL translation, this paper proposes MaskSQL—a semantic abstraction–based privacy-preserving framework. Methodologically, MaskSQL replaces sensitive fields in prompts via tunable, semantics-preserving abstract mappings—rather than simple masking or generalization—and integrates a context-aware restoration mechanism to enable secure, faithful SQL generation. Crucially, it operates without requiring local model deployment, thus balancing strong privacy guarantees with high semantic utility. Experimental results demonstrate that MaskSQL significantly outperforms state-of-the-art SLMs across multiple benchmarks, approaches the performance of top-tier LLMs, and strictly prevents sensitive information leakage. By enabling accurate, privacy-compliant text-to-SQL conversion, MaskSQL establishes a novel pathway for deploying large models safely in high-privacy-demand scenarios.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown promising performance on tasks that require reasoning, such as text-to-SQL, code generation, and debugging. However, regulatory frameworks with strict privacy requirements constrain their integration into sensitive systems. State-of-the-art LLMs are also proprietary, costly, and resource-intensive, making local deployment impractical. Consequently, utilizing such LLMs often requires sharing data with third-party providers, raising privacy concerns and risking noncompliance with regulations. Although fine-tuned small language models (SLMs) can outperform LLMs on certain tasks and be deployed locally to mitigate privacy concerns, they underperform on more complex tasks such as text-to-SQL translation. In this work, we introduce MaskSQL, a text-to-SQL framework that utilizes abstraction as a privacy protection mechanism to mask sensitive information in LLM prompts. Unlike redaction, which removes content entirely, or generalization, which broadens tokens, abstraction retains essential information while discarding unnecessary details, striking an effective privacy-utility balance for the text-to-SQL task. Moreover, by providing mechanisms to control the privacy-utility tradeoff, MaskSQL facilitates adoption across a broader range of use cases. Our experimental results show that MaskSQL outperforms leading SLM-based text-to-SQL models and achieves performance approaching state-of-the-art LLM-based models, while preserving privacy.

Problem

Research questions and friction points this paper is trying to address.

Protecting sensitive data in LLM text-to-SQL systems

Balancing privacy and utility via abstraction techniques

Enabling secure SQL generation without third-party data sharing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses abstraction to mask sensitive information in prompts

Balances privacy and utility for text-to-SQL tasks

Controls privacy-utility tradeoff for broader adoption

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks