CRAFT: Training-Free Cascaded Retrieval for Tabular QA

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In table question answering (TQA), large-scale table retrieval faces challenges including high computational overhead, frequent model retraining, and poor cross-domain adaptability. To address these, we propose a training-free cascaded retrieval framework: first, a lightweight sparse retriever (BM25) performs coarse-grained candidate table selection; then, dense retrieval (DTR/ColBERT) followed by a neural re-ranker refines the ranking. Crucially, we leverage Gemini Flash 1.5 to automatically generate descriptive table titles and semantic summaries—enhancing table representation quality and cross-domain generalization without human annotation. This approach entirely eliminates reliance on labeled data or fine-tuning, unlike conventional dense retrieval methods. Evaluated on NQ-Tables, our method outperforms state-of-the-art sparse, dense, and hybrid baselines across all retrieval metrics. End-to-end TQA performance improves significantly, and the framework supports seamless integration with multiple large language models.

Technology Category

Application Category

📝 Abstract
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we propose $ extbf{CRAFT}$, a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables before applying more computationally expensive dense models and neural re-rankers. Our approach achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers. We further enhance table representations by generating table descriptions and titles using Gemini Flash 1.5. End-to-end TQA results using various Large Language Models (LLMs) on NQ-Tables, a subset of the Natural Questions Dataset, demonstrate $ extbf{CRAFT}$ effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs in table retrieval for QA
Eliminating retraining needs for new datasets in retrieval
Improving retrieval accuracy with cascaded sparse-dense methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascaded retrieval with sparse and dense models
Training-free approach for adaptability
Enhanced table representations using Gemini Flash