🤖 AI Summary
Existing GNN-based GED approximation methods suffer from two key limitations: (i) difficulty in modeling global structural correspondences, and (ii) spurious signals introduced by node-level matching, leading to inaccurate edit cost estimation. This paper proposes a decoupled graph similarity learning framework—the first to jointly model graph-level alignment and substructure-level edit cost estimation. It explicitly distinguishes aligned from unaligned substructures, thereby avoiding structural mismatches and cost confounding. An end-to-end GNN architecture enables global alignment-aware similarity estimation. The method achieves state-of-the-art performance on four benchmark datasets. Ablation studies and visualization analyses confirm that the model learns semantically coherent, well-decoupled substructure representations—significantly improving both GED approximation accuracy and interpretability.
📝 Abstract
Graph Similarity Computation (GSC) is a fundamental graph related task where Graph Edit Distance (GED) serves as a prevalent metric. GED is determined by an optimal alignment between a pair of graphs that partitions each into aligned (zero-cost) and unaligned (cost-incurring) substructures. Due to NP-hard nature of exact GED computation, GED approximations based on Graph Neural Network(GNN) have emerged. Existing GNN-based GED approaches typically learn node embeddings for each graph and then aggregate pairwise node similarities to estimate the final similarity. Despite their effectiveness, we identify a mismatch between this prevalent node-centric matching paradigm and the core principles of GED. This discrepancy leads to two critical limitations: (1) a failure to capture the global structural correspondence for optimal alignment, and (2) a misattribution of edit costs driven by spurious node level signals. To address these limitations, we propose GCGSim, a GED-consistent graph similarity learning framework centering on graph-level matching and substructure-level edit costs. Specifically, we make three core technical contributions. Extensive experiments on four benchmark datasets show that GCGSim achieves state-of-the-art performance. Our comprehensive analyses further validate that the framework effectively learns disentangled and semantically meaningful substructure representations.