TabEmb: Joint Semantic-Structure Embedding for Table Annotation

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

165K/year
🤖 AI Summary
Existing table annotation methods linearize two-dimensional tables into one-dimensional sequences, which compromises semantic expressiveness, discards structural information, and limits model generalization. This work proposes TabEmb, the first approach to decouple semantic encoding from structural modeling: it leverages large language models to generate column-wise semantic embeddings and explicitly captures inter-column relationships using a graph neural network, thereby enabling joint representation learning of semantics and structure. Evaluated across multiple table annotation tasks, TabEmb substantially outperforms strong existing baselines, significantly improving both annotation quality and model generalization.

Technology Category

Application Category

📝 Abstract
Table annotation is crucial for making web and enterprise tables usable in downstream NLP applications. Unlike textual data where learning semantically rich token or sentence embeddings often suffice, tables are structured combinations of columns wherein useful representations must jointly capture column's semantics and the inter-column relationships. Existing models learn by linearizing the 2D table into a 1D token sequence and encoding it with pretrained language models (PLMs) such as BERT. However, this leads to limited semantic quality and weaker generalization to unseen or rare values compared to modern LLMs, and degraded structural modeling due to 2D-to-1D flattening and context-length constraints. We propose TabEmb, which directly targets these limitations by decoupling semantic encoding from structural modeling. An LLM first produces semantically rich embeddings for each column, and a graph-based module over columns then injects relationships into the embeddings, yielding joint semantic-tructural representations for table annotation. Experiments show that TabEmb consistently outperforms strong baselines on different table annotation tasks. Source code and datasets are available at https://github.com/hoseinzadeehsan/TabEmb
Problem

Research questions and friction points this paper is trying to address.

table annotation
semantic embedding
structural modeling
2D-to-1D flattening
context-length constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

table annotation
semantic-structure embedding
large language models
graph-based modeling
column relationships
🔎 Similar Papers
No similar papers found.