ZTab: Domain-based Zero-shot Annotation for Table Columns

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches struggle to efficiently and privacy-safely identify semantic types of table columns due to the scarcity of labeled data, the large number of possible semantic types, and reliance on proprietary large language models. To address these challenges, this work proposes ZTab, a domain-aware zero-shot column type recognition framework. ZTab leverages a domain configuration mechanism to generate synthetic tables and fine-tunes open-source large language models to support three zero-shot settings, enabling accurate cross-table type inference within the same domain without requiring user-provided annotations. The method maintains broad applicability while significantly improving accuracy under large-scale semantic type vocabularies and effectively reducing dependence on closed-source models and associated privacy risks.

Technology Category

Application Category

📝 Abstract
This study addresses the challenge of automatically detecting semantic column types in relational tables, a key task in many real-world applications. Zero-shot modeling eliminates the need for user-provided labeled training data, making it ideal for scenarios where data collection is costly or restricted due to privacy concerns. However, existing zero-shot models suffer from poor performance when the number of semantic column types is large, limited understanding of tabular structure, and privacy risks arising from dependence on high-performance closed-source LLMs. We introduce ZTab, a domain-based zero-shot framework that addresses both performance and zero-shot requirements. Given a domain configuration consisting of a set of predefined semantic types and sample table schemas, ZTab generates pseudo-tables for the sample schemas and fine-tunes an annotation LLM on them. ZTab is domain-based zero-shot in that it does not depend on user-specific labeled training data; therefore, no retraining is needed for a test table from a similar domain. We describe three cases of domain-based zero-shot. The domain configuration of ZTab provides a trade-off between the extent of zero-shot and annotation performance: a "universal domain" that contains all semantic types approaches "pure" zero-shot, while a "specialized domain" that contains semantic types for a specific application enables better zero-shot performance within that domain. Source code and datasets are available at https://github.com/hoseinzadeehsan/ZTab
Problem

Research questions and friction points this paper is trying to address.

zero-shot
semantic column type
tabular data
privacy
domain-based
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot learning
table annotation
domain-based modeling
semantic column typing
pseudo-table generation
🔎 Similar Papers
No similar papers found.
E
Ehsan Hoseinzade
School of Computing Science, Simon Fraser University, Burnaby, Canada
Ke Wang
Ke Wang
Professor of Computing Science, Simon Fraser University
data miningdatabase