An Efficient Proximity Graph-based Approach to Table Union Search

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the high computational overhead in table joint search under multi-vector models—caused by reliance on bipartite maximum matching—this paper proposes a proximity-graph-based multi-stage retrieval framework. The method replaces exhaustive bipartite matching with a lightweight many-to-one matching filtering strategy and integrates a novel refinement mechanism with an enhanced pruning scheme to jointly reduce candidate set size. By synergistically combining multi-vector embeddings, proximity graph indexing, and hierarchical filtering, the approach achieves 3.6–6.0× speedup across six benchmark datasets while preserving recall performance comparable to the best baseline. This significantly improves both efficiency and scalability of semantic-driven table discovery.

Technology Category

Application Category

📝 Abstract

Neural embedding models are extensively employed in the table union search problem, which aims to find semantically compatible tables that can be merged with a given query table. In particular, multi-vector models, which represent a table as a vector set (typically one vector per column), have been demonstrated to achieve superior retrieval quality by capturing fine-grained semantic alignments. However, this problem faces more severe efficiency challenges than the single-vector problem due to the inherent dependency on bipartite graph maximum matching to compute unionability scores. Therefore, this paper proposes an efficient Proximity Graph-based Table Union Search (PGTUS) approach. PGTUS employs a multi-stage pipeline that combines a novel refinement strategy, a filtering strategy based on many-to-one bipartite matching. Besides, we propose an enhanced pruning strategy to prune the candidate set, which further improve the search efficiency. Extensive experiments on six benchmark datasets demonstrate that our approach achieves 3.6-6.0X speedup over existing approaches while maintaining comparable recall rates.

Problem

Research questions and friction points this paper is trying to address.

Improving efficiency of multi-vector table union search

Reducing computational cost of semantic table matching

Accelerating proximity graph-based union search while maintaining accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proximity graph-based table union search approach

Multi-stage pipeline with novel refinement strategy

Enhanced pruning strategy for candidate set

🔎 Similar Papers

TabSketchFM: Sketch-Based Tabular Representation Learning for Data Discovery Over Data Lakes