FastER: Fast On-Demand Entity Resolution in Property Graphs

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Entity resolution (ER) on large-scale attributed graphs suffers from poor real-time performance and high computational overhead—especially in “on-demand ER,” where users require resolution only over local subgraphs. This paper proposes the first on-demand ER framework for attributed graphs. First, it introduces Graph Differential Dependencies (GDDs), a unified model jointly capturing structural and attribute semantics. Second, it constructs a blocking graph and incorporates Progressive Profile Scheduling (PPS) to enable streaming result delivery. Third, it integrates subgraph filtering, attributed graph embedding, and similarity-based pruning to achieve efficient pairwise matching. Evaluated on multiple benchmark datasets, the method reduces response latency by one to two orders of magnitude over state-of-the-art approaches, delivers the first result in milliseconds, and maintains high accuracy.

Technology Category

Application Category

📝 Abstract
Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high computational costs and lack of real-time capabilities. In many applications, users need to resolve entities for only a small portion of their data, making full data processing unnecessary -- a scenario known as"ER-on-demand". This paper proposes FastER, an efficient ER-on-demand framework for property graphs. Our approach uses graph differential dependencies (GDDs) as a knowledge encoding language to design effective filtering mechanisms that leverage both structural and attribute semantics of graphs. We construct a blocking graph from filtered subgraphs to reduce the number of candidate entity pairs requiring comparison. Additionally, FastER incorporates Progressive Profile Scheduling (PPS), allowing the system to incrementally produce results throughout the resolution process. Extensive evaluations on multiple benchmark datasets demonstrate that FastER significantly outperforms state-of-the-art ER methods in computational efficiency and real-time processing for on-demand tasks while ensuring reliability. We make FastER publicly available at: https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB
Problem

Research questions and friction points this paper is trying to address.

Identifies and links database records referring to same entities
Addresses inefficiency of batch processing in large datasets
Enables real-time entity resolution for selective data portions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses graph differential dependencies for filtering
Constructs blocking graph to reduce comparisons
Incorporates Progressive Profile Scheduling
🔎 Similar Papers
No similar papers found.
S
Shujing Wang
Huazhong Agricultural University, Wuhan, China
Selasi Kwashie
Selasi Kwashie
AI & Cyber Futures Institute, Charles Sturt University
Database TheoryData MiningCyber Security
Michael Bewong
Michael Bewong
Senior Lecturer, Charles Sturt University
Data ScienceApplied Machine LearningCyber Security
Junwei Hu
Junwei Hu
Undergraduate Student of Software Engineering, Tongji University
Software EngineeringAI4SESE4AINLP
V
V. Nofong
Department of Computer Science and Engineering, University of Mines and Technology, Ghana
Z
Zaiwen Feng
Huazhong Agricultural University, Wuhan, China