🤖 AI Summary
Entity resolution (ER) on large-scale attributed graphs suffers from poor real-time performance and high computational overhead—especially in “on-demand ER,” where users require resolution only over local subgraphs. This paper proposes the first on-demand ER framework for attributed graphs. First, it introduces Graph Differential Dependencies (GDDs), a unified model jointly capturing structural and attribute semantics. Second, it constructs a blocking graph and incorporates Progressive Profile Scheduling (PPS) to enable streaming result delivery. Third, it integrates subgraph filtering, attributed graph embedding, and similarity-based pruning to achieve efficient pairwise matching. Evaluated on multiple benchmark datasets, the method reduces response latency by one to two orders of magnitude over state-of-the-art approaches, delivers the first result in milliseconds, and maintains high accuracy.
📝 Abstract
Entity resolution (ER) is the problem of identifying and linking database records that refer to the same real-world entity. Traditional ER methods use batch processing, which becomes impractical with growing data volumes due to high computational costs and lack of real-time capabilities. In many applications, users need to resolve entities for only a small portion of their data, making full data processing unnecessary -- a scenario known as"ER-on-demand". This paper proposes FastER, an efficient ER-on-demand framework for property graphs. Our approach uses graph differential dependencies (GDDs) as a knowledge encoding language to design effective filtering mechanisms that leverage both structural and attribute semantics of graphs. We construct a blocking graph from filtered subgraphs to reduce the number of candidate entity pairs requiring comparison. Additionally, FastER incorporates Progressive Profile Scheduling (PPS), allowing the system to incrementally produce results throughout the resolution process. Extensive evaluations on multiple benchmark datasets demonstrate that FastER significantly outperforms state-of-the-art ER methods in computational efficiency and real-time processing for on-demand tasks while ensuring reliability. We make FastER publicly available at: https://anonymous.4open.science/r/On_Demand_Entity_Resolution-9DFB