REaR: Retrieve, Expand and Refine for Effective Multitable Retrieval

📅 2025-11-02

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Existing NL2SQL multi-table retrieval methods optimize only query-to-single-table semantic relevance, neglecting inter-table structural compatibility and joinability—leading to low recall quality and poor efficiency. This paper proposes REaR, a novel three-stage framework that, for the first time, decouples table expansion from refinement. First, it performs structure-aware expansion via precomputed column embeddings, integrating dense and sparse representations. Second, it refines candidate tables using a noise-robust filtering mechanism, explicitly separating semantic matching from structural connectivity modeling. Crucially, REaR operates without large language models (LLMs) during retrieval. Evaluated on BIRD, MMQA, and Spider, it significantly improves multi-table recall and downstream SQL execution accuracy—matching the performance of LLM-augmented systems while substantially reducing latency and computational cost.

Technology Category

Application Category

📝 Abstract

Answering natural language queries over relational data often requires retrieving and reasoning over multiple tables, yet most retrievers optimize only for query-table relevance and ignore table table compatibility. We introduce REAR (Retrieve, Expand and Refine), a three-stage, LLM-free framework that separates semantic relevance from structural joinability for efficient, high-fidelity multi-table retrieval. REAR (i) retrieves query-aligned tables, (ii) expands these with structurally joinable tables via fast, precomputed column-embedding comparisons, and (iii) refines them by pruning noisy or weakly related candidates. Empirically, REAR is retriever-agnostic and consistently improves dense/sparse retrievers on complex table QA datasets (BIRD, MMQA, and Spider) by improving both multi-table retrieval quality and downstream SQL execution. Despite being LLM-free, it delivers performance competitive with state-of-the-art LLM-augmented retrieval systems (e.g.,ARM) while achieving much lower latency and cost. Ablations confirm complementary gains from expansion and refinement, underscoring REAR as a practical, scalable building block for table-based downstream tasks (e.g., Text-to-SQL).

Problem

Research questions and friction points this paper is trying to address.

Retrieving multiple tables for natural language queries

Separating semantic relevance from structural joinability

Improving multi-table retrieval quality and SQL execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage framework separates relevance from joinability

Expands tables via precomputed column-embedding comparisons

Refines results by pruning noisy candidate tables

🔎 Similar Papers

Is Table Retrieval a Solved Problem? Exploring Join-Aware Multi-Table Retrieval