CBR-to-SQL: Rethinking Retrieval-based Text-to-SQL using Case-based Reasoning in the Healthcare Domain

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

This work addresses the challenges of terminology variability and noise in medical-domain Text-to-SQL tasks, where conventional retrieval-augmented approaches struggle to balance coverage and accuracy using static example pools. The authors propose CBR-to-SQL, a novel framework that introduces case-based reasoning (CBR) to this task by abstracting question–SQL pairs into reusable, structured case templates. A two-stage retrieval mechanism is designed: first matching logical structures and then resolving specific entities. This approach significantly enhances sample efficiency and robustness, achieving state-of-the-art logical form accuracy and competitive execution accuracy on the MIMICSQL dataset. Notably, it demonstrates superior performance under data-scarce conditions and in the presence of retrieval perturbations.

Technology Category

Application Category

📝 Abstract

Extracting insights from Electronic Health Record (EHR) databases often requires SQL expertise, creating a barrier for healthcare decision-making and research. While a promising approach is to use Large Language Models (LLMs) to translate natural language questions to SQL via Retrieval-Augmented Generation (RAG), adapting this approach to the medical domain is non-trivial. Standard RAG relies on single-step retrieval from a static pool of examples, which struggles with the variability and noise of medical terminology and jargon. This often leads to anti-patterns such as expanding the task demonstration pool to improve coverage, which in turn introduces noise and scalability problems. To address this, we introduce CBR-to-SQL, a framework inspired by Case-Based Reasoning (CBR). It represents question-SQL pairs as reusable, abstract case templates and utilizes a two-stage retrieval process that first captures logical structure and then resolves relevant entities. Evaluated on MIMICSQL, CBR-to-SQL achieves state-of-the-art logical form accuracy and competitive execution accuracy. More importantly, it demonstrates higher sample efficiency and robustness than standard RAG approaches, particularly under data scarcity and retrieval perturbations.

Problem

Research questions and friction points this paper is trying to address.

Text-to-SQL

Electronic Health Records

Retrieval-Augmented Generation

Medical Domain

Natural Language to SQL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Case-Based Reasoning

Text-to-SQL

Retrieval-Augmented Generation