🤖 AI Summary
This work addresses the critical challenge of efficiently selecting semantically precise, sufficient, and minimal relevant data sources when large language model (LLM)-driven agents perform complex multi-step tasks. The authors propose Metadata Reasoner—the first approach to integrate agent-based architecture into metadata reasoning—which retrieves candidate tables via a table retrieval engine and leverages an LLM to autonomously analyze multidimensional metadata, thereby achieving semantic-level understanding and selection of a minimally sufficient set of data sources. The method demonstrates substantially improved robustness in noisy environments, achieving an F1 score of 83.16% on KramaBench—outperforming the previous state-of-the-art by 32 percentage points—and attaining an F1 score of 85.5% on the noisy BIRD synthetic benchmark, with a 99% success rate in avoiding low-quality data.
📝 Abstract
As LLM-driven autonomous agents evolve to perform complex, multi-step tasks that require integrating multiple datasets, the problem of discovering relevant data sources becomes a key bottleneck. Beyond the challenge posed by the sheer volume of available data sources, data-source selection is difficult because the semantics of data are extremely nuanced and require considering many aspects of the data. To address this, we introduce the Metadata Reasoner, an agentic approach to metadata reasoning, designed to identify a small set of data sources that are both sufficient and minimal for a given analytical task. The Metadata Reasoner leverages a table-search engine to retrieve candidate tables, and then autonomously consults various aspects of the available metadata to determine whether the candidates fit the requirements of the task. We demonstrate the effectiveness of the Metadata Reasoner through a series of empirical studies. Evaluated on the real-world KramaBench datasets for data selection, our approach achieves an average F1-score of 83.16%, outperforming state-of-the-art baselines by a substantial margin of 32 percentage points. Furthermore, evaluations on a newly-created synthetic benchmark based on the BIRD data lake reveal that the Metadata Reasoner is highly robust against redundant and low-quality tables that may be in the data lake. In this noisy environment, it maintains an average of 85.5% F1-score for selecting the right datasets and demonstrates a 99% success rate in avoiding low-quality data.