🤖 AI Summary
In table question answering (TQA), large-scale table retrieval faces challenges including high computational overhead, frequent model retraining, and poor cross-domain adaptability. To address these, we propose a training-free cascaded retrieval framework: first, a lightweight sparse retriever (BM25) performs coarse-grained candidate table selection; then, dense retrieval (DTR/ColBERT) followed by a neural re-ranker refines the ranking. Crucially, we leverage Gemini Flash 1.5 to automatically generate descriptive table titles and semantic summaries—enhancing table representation quality and cross-domain generalization without human annotation. This approach entirely eliminates reliance on labeled data or fine-tuning, unlike conventional dense retrieval methods. Evaluated on NQ-Tables, our method outperforms state-of-the-art sparse, dense, and hybrid baselines across all retrieval metrics. End-to-end TQA performance improves significantly, and the framework supports seamless integration with multiple large language models.
📝 Abstract
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we propose $ extbf{CRAFT}$, a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables before applying more computationally expensive dense models and neural re-rankers. Our approach achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers. We further enhance table representations by generating table descriptions and titles using Gemini Flash 1.5. End-to-end TQA results using various Large Language Models (LLMs) on NQ-Tables, a subset of the Natural Questions Dataset, demonstrate $ extbf{CRAFT}$ effectiveness.