HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery

πŸ“… 2026-01-03
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing table discovery methods, which struggle to effectively model both intra-table structure and complex inter-table relationships, and often yield inconsistent results due to online ranking strategies that ignore interactions among candidate columns. To overcome these challenges, the paper introduces a novel hypergraph-based formulation that captures table structure through intra-table hyperedges and enhances cross-table connections using large language model (LLM)-augmented hyperedges, reframing the task as hypergraph link prediction. A hierarchical interaction network is employed to learn column representations, complemented by a consistency-aware top-k column selection strategy and a maximum spanning tree–based reranking mechanism. Experimental results demonstrate that the proposed approach significantly outperforms state-of-the-art baselines, achieving average improvements of 21.4% in Precision@15 and 17.2% in Recall@15.

Technology Category

Application Category

πŸ“ Abstract
As a pivotal task in data lake management, joinable table discovery has attracted widespread interest. While existing language model-based methods achieve remarkable performance by combining offline column representation learning with online ranking, their design insufficiently accounts for the underlying structural interactions: (1) offline, they directly model tables into isolated or pairwise columns, thereby struggling to capture the rich inter-table and intra-table structural information; and (2) online, they rank candidate columns based solely on query-candidate similarity, ignoring the mutual interactions among the candidates, leading to incoherent result sets. To address these limitations, we propose HyperJoin, a large language model (LLM)-augmented Hypergraph framework for Joinable table discovery. Specifically, we first construct a hypergraph to model tables using both the intra-table hyperedges and the LLM-augmented inter-table hyperedges. Consequently, the task of joinable table discovery is formulated as link prediction on this constructed hypergraph. We then design HIN, a Hierarchical Interaction Network that learns expressive column representations through bidirectional message passing over columns and hyperedges. To strengthen coherence and internal consistency in the result columns, we cast online ranking as a coherence-aware top-k column selection problem. We then introduce a reranking module that leverages a maximum spanning tree algorithm to prune noisy connections and maximize coherence. Experiments demonstrate the superiority of HyperJoin, achieving average improvements of 21.4% (Precision@15) and 17.2% (Recall@15) over the best baseline.
Problem

Research questions and friction points this paper is trying to address.

joinable table discovery
hypergraph
link prediction
structural interaction
coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypergraph
Large Language Model (LLM)
Link Prediction
Hierarchical Interaction Network
Coherence-aware Ranking
πŸ”Ž Similar Papers
No similar papers found.