An LLM Agent-Based Complex Semantic Table Annotation Approach

๐Ÿ“… 2025-08-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In Semantic Table Annotation (STA), joint Column Type Annotation (CTA) and Cell Entity Annotation (CEA) face challenges including semantic drift, strict ontological hierarchy constraints, and lexical variations (e.g., synonyms, misspellings, abbreviations). To address these, we propose an LLM-based agent framework featuring five specialized external tools integrated via the ReAct paradigm for dynamic strategy selection; a table-aware adaptive prompting mechanism; and Levenshtein-distanceโ€“guided candidate matching to minimize redundant computation. Evaluated on SemTab Tough Tables and BiodivTab, our method achieves new state-of-the-art accuracy while reducing inference time by 70% and LLM token consumption by 60%. The approach thus delivers both high precision and computational efficiency, establishing a scalable, low-overhead paradigm for complex semantic table annotation.

Technology Category

Application Category

๐Ÿ“ Abstract
The Semantic Table Annotation (STA) task, which includes Column Type Annotation (CTA) and Cell Entity Annotation (CEA), maps table contents to ontology entities and plays important roles in various semantic applications. However, complex tables often pose challenges such as semantic loss of column names or cell values, strict ontological hierarchy requirements, homonyms, spelling errors, and abbreviations, which hinder annotation accuracy. To address these issues, this paper proposes an LLM-based agent approach for CTA and CEA. We design and implement five external tools with tailored prompts based on the ReAct framework, enabling the STA agent to dynamically select suitable annotation strategies depending on table characteristics. Experiments are conducted on the Tough Tables and BiodivTab datasets from the SemTab challenge, which contain the aforementioned challenges. Our method outperforms existing approaches across various metrics. Furthermore, by leveraging Levenshtein distance to reduce redundant annotations, we achieve a 70% reduction in time costs and a 60% reduction in LLM token usage, providing an efficient and cost-effective solution for STA.
Problem

Research questions and friction points this paper is trying to address.

Addressing semantic loss in complex table annotation tasks
Overcoming challenges like homonyms and spelling errors in STA
Reducing time and token costs in semantic table processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agent for semantic table annotation
Five external tools with tailored prompts
Levenshtein distance reduces redundant annotations
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yilin Geng
College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China, 430070
S
Shujing Wang
College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China, 430070
C
Chuan Wang
College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China, 430070
Keqing He
Keqing He
Unknown affiliation
LLM
Y
Yanfei Lv
Military Science Information Research Center, Academy of Military Sciences, Beijing, 100080
Y
Ying Wang
College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China, 430070
Z
Zaiwen Feng
College of Informatics, Huazhong Agricultural University, Wuhan, Hubei, China, 430070; Hubei Key Laboratory of Agricultural Bioinformatics; Engineering Research Center of Agricultural Intelligent Technology, Ministry of Education
Xiaoying Bai
Xiaoying Bai
Tsinghua University
Software engineeringsoftware testingservice-oriented computingcloud computing