Unlocking the Power of Large Language Models for Multi-table Entity Matching

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenges of semantic inconsistency, low matching efficiency, and noise interference arising from numerical attribute discrepancies in multi-table entity matching. To tackle these issues, the paper proposes LLM4MEM, a novel framework that systematically leverages large language models (LLMs) for aligning unlabeled entities across multiple sources—a first in the field. The approach employs multi-style prompt engineering to harmonize attribute semantics, utilizes transitive consistency embeddings to accelerate pre-matching, and incorporates a density-aware pruning mechanism to refine alignment results. Evaluated on six standard multi-table entity matching benchmarks, LLM4MEM achieves an average F1-score improvement of 5.1% over state-of-the-art baselines, demonstrating its effectiveness and innovation.

Technology Category

Application Category

📝 Abstract

Multi-table entity matching (MEM) addresses the limitations of dual-table approaches by enabling simultaneous identification of equivalent entities across multiple data sources without unique identifiers. However, existing methods relying on pre-trained language models struggle to handle semantic inconsistencies caused by numerical attribute variations. Inspired by the powerful language understanding capabilities of large language models (LLMs), we propose a novel LLM-based framework for multi-table entity matching, termed LLM4MEM. Specifically, we first propose a multi-style prompt-enhanced LLM attribute coordination module to address semantic inconsistencies. Then, to alleviate the matching efficiency problem caused by the surge in the number of entities brought by multiple data sources, we develop a transitive consensus embedding matching module to tackle entity embedding and pre-matching issues. Finally, to address the issue of noisy entities during the matching process, we introduce a density-aware pruning module to optimize the quality of multi-table entity matching. We conducted extensive experiments on 6 MEM datasets, and the results show that our model improves by an average of 5.1% in F1 compared with the baseline model. Our code is available at https://github.com/Ymeki/LLM4MEM.

Problem

Research questions and friction points this paper is trying to address.

multi-table entity matching

semantic inconsistency

numerical attribute variation

large language models

entity resolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Multi-table Entity Matching

Prompt Engineering