CBR-to-SQL: Rethinking Retrieval-based Text-to-SQL using Case-based Reasoning in the Healthcare Domain

πŸ“… 2026-03-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenges of terminology variability and noise in medical-domain Text-to-SQL tasks, where conventional retrieval-augmented approaches struggle to balance coverage and accuracy using static example pools. The authors propose CBR-to-SQL, a novel framework that introduces case-based reasoning (CBR) to this task by abstracting question–SQL pairs into reusable, structured case templates. A two-stage retrieval mechanism is designed: first matching logical structures and then resolving specific entities. This approach significantly enhances sample efficiency and robustness, achieving state-of-the-art logical form accuracy and competitive execution accuracy on the MIMICSQL dataset. Notably, it demonstrates superior performance under data-scarce conditions and in the presence of retrieval perturbations.

Technology Category

Application Category

πŸ“ Abstract
Extracting insights from Electronic Health Record (EHR) databases often requires SQL expertise, creating a barrier for healthcare decision-making and research. While a promising approach is to use Large Language Models (LLMs) to translate natural language questions to SQL via Retrieval-Augmented Generation (RAG), adapting this approach to the medical domain is non-trivial. Standard RAG relies on single-step retrieval from a static pool of examples, which struggles with the variability and noise of medical terminology and jargon. This often leads to anti-patterns such as expanding the task demonstration pool to improve coverage, which in turn introduces noise and scalability problems. To address this, we introduce CBR-to-SQL, a framework inspired by Case-Based Reasoning (CBR). It represents question-SQL pairs as reusable, abstract case templates and utilizes a two-stage retrieval process that first captures logical structure and then resolves relevant entities. Evaluated on MIMICSQL, CBR-to-SQL achieves state-of-the-art logical form accuracy and competitive execution accuracy. More importantly, it demonstrates higher sample efficiency and robustness than standard RAG approaches, particularly under data scarcity and retrieval perturbations.
Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL
Electronic Health Records
Retrieval-Augmented Generation
Medical Domain
Natural Language to SQL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Case-Based Reasoning
Text-to-SQL
Retrieval-Augmented Generation
Two-stage Retrieval
Healthcare Domain
πŸ”Ž Similar Papers
No similar papers found.
H
Hung Nguyen
Department of Computer Science, Aalto University
H
Hans Moen
Department of Computer Science, Aalto University
Pekka Marttinen
Pekka Marttinen
Aalto University
Statistical machine learningComputational biology