🤖 AI Summary
Existing tabular NER datasets are overly simplified and fail to reflect the structural complexity and evaluation requirements of real-world Wikipedia tables. To address this, we introduce Wiki-TabNER—the first fine-grained NER benchmark specifically designed for complex Wikipedia tables. It features systematic annotation of nested and multi-entity cells, with entities mapped to fine-grained DBpedia categories. We propose a structure-aware table prompting framework enabling large language models to perform zero-shot or few-shot NER and entity linking on non-flat tabular text. The high-quality dataset is constructed via ontology alignment and rigorous human verification, covering thousands of tables and over 100,000 entity annotations. Empirical evaluation reveals critical bottlenecks in current LLMs: poor modeling of cross-cell contextual dependencies and insufficient discrimination among fine-grained entity types. Wiki-TabNER establishes a new standard for tabular understanding research, offering both a challenging benchmark and a principled methodological foundation.
📝 Abstract
Interest in solving table interpretation tasks has grown over the years, yet it still relies on existing datasets that may be overly simplified. This is potentially reducing the effectiveness of the dataset for thorough evaluation and failing to accurately represent tables as they appear in the real-world. To enrich the existing benchmark datasets, we extract and annotate a new, more challenging dataset. The proposed Wiki-TabNER dataset features complex tables containing several entities per cell, with named entities labeled using DBpedia classes. This dataset is specifically designed to address named entity recognition (NER) task within tables, but it can also be used as a more challenging dataset for evaluating the entity linking task. In this paper we describe the distinguishing features of the Wiki-TabNER dataset and the labeling process. In addition, we propose a prompting framework for evaluating the new large language models on the within tables NER task. Finally, we perform qualitative analysis to gain insights into the challenges encountered by the models and to understand the limitations of the proposed~dataset.