TARGET: Benchmarking Table Retrieval for Generative Tasks

📅 2025-05-14

📈 Citations: 1

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Prior work on generative tasks (e.g., text-to-SQL, question answering) largely overlooks table retrieval—a critical prerequisite for leveraging structured data. Method: We introduce TARGET, the first table-level retrieval benchmark explicitly designed for generative tasks, featuring a structured evaluation framework with multi-source real-world tabular datasets, fine-grained human annotations, and an integrated retrieval-generation evaluation pipeline. Contribution/Results: Experiments show dense retrieval using BERT-based table encoders substantially outperforms BM25 (mAP gain >40%). Metadata absence—especially table titles—degrades performance by up to 35%. Significant performance disparities exist across datasets and tasks. Effective table retrieval boosts SQL generation accuracy by up to 22%. This work is the first to systematically quantify the impact of table retrieval on downstream generative performance and establishes a standardized evaluation paradigm for structured-data retrieval.

Technology Category

Application Category

📝 Abstract

The data landscape is rich with structured data, often of high value to organizations, driving important applications in data analysis and machine learning. Recent progress in representation learning and generative models for such data has led to the development of natural language interfaces to structured data, including those leveraging text-to-SQL. Contextualizing interactions, either through conversational interfaces or agentic components, in structured data through retrieval-augmented generation can provide substantial benefits in the form of freshness, accuracy, and comprehensiveness of answers. The key question is: how do we retrieve the right table(s) for the analytical query or task at hand? To this end, we introduce TARGET: a benchmark for evaluating TAble Retrieval for GEnerative Tasks. With TARGET we analyze the retrieval performance of different retrievers in isolation, as well as their impact on downstream tasks. We find that dense embedding-based retrievers far outperform a BM25 baseline which is less effective than it is for retrieval over unstructured text. We also surface the sensitivity of retrievers across various metadata (e.g., missing table titles), and demonstrate a stark variation of retrieval performance across datasets and tasks. TARGET is available at https://target-benchmark.github.io.

Problem

Research questions and friction points this paper is trying to address.

How to retrieve the right tables for analytical queries

Evaluating table retrieval performance for generative tasks

Impact of retrievers on downstream task performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for table retrieval in generative tasks

Dense embedding retrievers outperform BM25 baseline

Analyzes retriever sensitivity to metadata variations

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering