FollowTable: A Benchmark for Instruction-Following Table Retrieval

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
Traditional table retrieval methods rely solely on topical semantic similarity, which is insufficient to meet the requirements of large language model agents that must adhere to explicit content and schema constraints specified in natural language instructions. This work introduces Instruction-Following Table Retrieval (IFTR), a novel task that demands simultaneous satisfaction of topical relevance and fine-grained instruction compliance. We formally define the IFTR task, construct FollowTable—a large-scale benchmark—and devise a taxonomy-based annotation pipeline alongside an instruction-responsiveness scoring metric. Systematic evaluation reveals that current retrieval models perform poorly on fine-grained instructions and schema constraints, exhibiting a systematic bias toward superficial semantic cues rather than deeper structural or directive alignment.
📝 Abstract
Table Retrieval (TR) has traditionally been formulated as an ad-hoc retrieval problem, where relevance is primarily determined by topical semantic similarity. With the growing adoption of LLM-based agentic systems, access to structured data is increasingly instruction-driven, where relevance is conditional on explicit content and schema constraints rather than topical similarity alone. We therefore formalize Instruction-Following Table Retrieval (IFTR), a new task that requires models to jointly satisfy topical relevance and fine-grained instruction constraints. We identify two core challenges in IFTR: (i) sensitivity to content scope, such as inclusion and exclusion constraints, and (ii) awareness of schema-grounded requirements, including column semantics and representation granularity--capabilities largely absent in existing retrievers. To support systematic evaluation, we introduce FollowTable, the first large-scale benchmark for IFTR, constructed via a taxonomy-driven annotation pipeline. We further propose a new metric, termed the Instruction Responsiveness Score, to evaluate whether retrieval rankings consistently adapt to user instructions relative to a topic-only baseline. Our results indicate that existing retrieval models struggle to follow fine-grained instructions over tabular data. In particular, they exhibit systematic biases toward surface-level semantic cues and remain limited in handling schema-grounded constraints, highlighting substantial room for future improvements.
Problem

Research questions and friction points this paper is trying to address.

Instruction-Following Table Retrieval
Table Retrieval
Structured Data Access
Schema-Grounded Constraints
Content Scope Sensitivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Instruction-Following Table Retrieval
FollowTable
schema-aware retrieval
instruction responsiveness
table retrieval benchmark