🤖 AI Summary
Graph Edit Distance (GED) computation faces three core challenges: lack of interpretable edit paths, reliance on scarce ground-truth GED annotations, and poor generalization. This paper proposes the first algebraic, unsupervised GED approximation framework—requiring no training, no ground-truth labels, and enabling cross-dataset generalization. Our method integrates spectral graph theory and matrix algebra to construct structure-aware embeddings, employs an optimal-transport-inspired distance metric, and explicitly decouples node/edge insertion, deletion, and substitution operations. It jointly outputs both a GED estimate and a human-interpretable edit path. Evaluated across diverse benchmarks spanning chemistry, vision, and social networks, our approach achieves state-of-the-art accuracy while accelerating computation by one to two orders of magnitude. It supports arbitrary edit cost configurations and scales to large graphs.
📝 Abstract
The need to identify graphs with small structural distances from a query arises in various domains such as biology, chemistry, recommender systems, and social network analysis. Among several methods for measuring inter-graph distance, Graph Edit Distance (GED) is preferred for its comprehensibility, though its computation is hindered by NP-hardness. Unsupervised methods often face challenges in providing accurate approximations. State-of-the-art GED approximations predominantly utilize neural methods, which, however, have several limitations: (i) lack an explanatory edit path corresponding to the approximated GED; (ii) require the NP-hard generation of ground-truth GEDs for training; and (iii) necessitate separate training on each dataset. In this paper, we propose EUGENE, an efficient algebraic unsupervised method that approximates GED while providing edit paths corresponding to the approximated cost. Extensive experimental evaluation demonstrates that EUGENE achieves state-of-the-art performance in GED estimation and exhibits superior scalability across diverse datasets and generalized cost settings.