An Efficient Proximity Graph-based Approach to Table Union Search

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational overhead in table joint search under multi-vector models—caused by reliance on bipartite maximum matching—this paper proposes a proximity-graph-based multi-stage retrieval framework. The method replaces exhaustive bipartite matching with a lightweight many-to-one matching filtering strategy and integrates a novel refinement mechanism with an enhanced pruning scheme to jointly reduce candidate set size. By synergistically combining multi-vector embeddings, proximity graph indexing, and hierarchical filtering, the approach achieves 3.6–6.0× speedup across six benchmark datasets while preserving recall performance comparable to the best baseline. This significantly improves both efficiency and scalability of semantic-driven table discovery.

Technology Category

Application Category

📝 Abstract
Neural embedding models are extensively employed in the table union search problem, which aims to find semantically compatible tables that can be merged with a given query table. In particular, multi-vector models, which represent a table as a vector set (typically one vector per column), have been demonstrated to achieve superior retrieval quality by capturing fine-grained semantic alignments. However, this problem faces more severe efficiency challenges than the single-vector problem due to the inherent dependency on bipartite graph maximum matching to compute unionability scores. Therefore, this paper proposes an efficient Proximity Graph-based Table Union Search (PGTUS) approach. PGTUS employs a multi-stage pipeline that combines a novel refinement strategy, a filtering strategy based on many-to-one bipartite matching. Besides, we propose an enhanced pruning strategy to prune the candidate set, which further improve the search efficiency. Extensive experiments on six benchmark datasets demonstrate that our approach achieves 3.6-6.0X speedup over existing approaches while maintaining comparable recall rates.
Problem

Research questions and friction points this paper is trying to address.

Improving efficiency of multi-vector table union search
Reducing computational cost of semantic table matching
Accelerating proximity graph-based union search while maintaining accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proximity graph-based table union search approach
Multi-stage pipeline with novel refinement strategy
Enhanced pruning strategy for candidate set
🔎 Similar Papers
No similar papers found.
Yiming Xie
Yiming Xie
PhD Student, Northeastern University
Computer Vision3D Vision
Hua Dai
Hua Dai
Martin V Smith School of Business and Economics, California State University Channel Islands
M
Mingfeng Jiang
Nanjing University of Posts and Telecommunication, Nanjing, China
Pengyue Li
Pengyue Li
Wuhan University
dataset search
Z
Zhengkai Zhang
Nanjing University of Posts and Telecommunication, Nanjing, China
B
Bohan Li
Nanjing University of Aeronautics and Astronautics, Nanjing, China