An LLM Agent-Based Complex Semantic Table Annotation Approach

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

In Semantic Table Annotation (STA), joint Column Type Annotation (CTA) and Cell Entity Annotation (CEA) face challenges including semantic drift, strict ontological hierarchy constraints, and lexical variations (e.g., synonyms, misspellings, abbreviations). To address these, we propose an LLM-based agent framework featuring five specialized external tools integrated via the ReAct paradigm for dynamic strategy selection; a table-aware adaptive prompting mechanism; and Levenshtein-distance–guided candidate matching to minimize redundant computation. Evaluated on SemTab Tough Tables and BiodivTab, our method achieves new state-of-the-art accuracy while reducing inference time by 70% and LLM token consumption by 60%. The approach thus delivers both high precision and computational efficiency, establishing a scalable, low-overhead paradigm for complex semantic table annotation.

Technology Category

Application Category

📝 Abstract

The Semantic Table Annotation (STA) task, which includes Column Type Annotation (CTA) and Cell Entity Annotation (CEA), maps table contents to ontology entities and plays important roles in various semantic applications. However, complex tables often pose challenges such as semantic loss of column names or cell values, strict ontological hierarchy requirements, homonyms, spelling errors, and abbreviations, which hinder annotation accuracy. To address these issues, this paper proposes an LLM-based agent approach for CTA and CEA. We design and implement five external tools with tailored prompts based on the ReAct framework, enabling the STA agent to dynamically select suitable annotation strategies depending on table characteristics. Experiments are conducted on the Tough Tables and BiodivTab datasets from the SemTab challenge, which contain the aforementioned challenges. Our method outperforms existing approaches across various metrics. Furthermore, by leveraging Levenshtein distance to reduce redundant annotations, we achieve a 70% reduction in time costs and a 60% reduction in LLM token usage, providing an efficient and cost-effective solution for STA.

Problem

Research questions and friction points this paper is trying to address.

Addressing semantic loss in complex table annotation tasks

Overcoming challenges like homonyms and spelling errors in STA

Reducing time and token costs in semantic table processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agent for semantic table annotation

Five external tools with tailored prompts

Levenshtein distance reduces redundant annotations

🔎 Similar Papers

No similar papers found.