🤖 AI Summary
This paper addresses the optimal color enumeration order problem for colored graphs: minimizing the total number of connected components (clusters) induced during sequential vertex coloring. Inspired by locality in Angluin’s pattern matching, we generalize the notion of locality to graph-structured data and introduce *k-locality*—a novel metric quantifying the spatial compactness of color distributions. To solve the problem, we propose a priority-search algorithm based on optimal prefix extension counts, achieving speedups of multiple orders of magnitude over brute-force enumeration; we further integrate graph coloring enumeration with hierarchical clustering heuristics for efficient approximate optimization. Theoretical analysis establishes complexity bounds for our approach. Experiments on a DBLP subgraph demonstrate that *k*-locality effectively supports knowledge discovery tasks—particularly topic evolution identification. Our core contributions are: (i) a formal model of locality for attributed graphs, (ii) the *k*-locality metric, and (iii) an efficient, near-optimal color enumeration algorithm.
📝 Abstract
In 2017 Day et al. introduced the notion of locality as a structural complexity-measure for patterns in the field of pattern matching established by Angluin in 1980. In 2019 Casel et al. showed that determining the locality of an arbitrary pattern is NP-complete. Inspired by hierarchical clustering, we extend the notion to coloured graphs, i.e., given a coloured graph determine an enumeration of the colours such that colouring the graph stepwise according to the enumeration leads to as few clusters as possible. Next to first theoretical results on graph classes, we propose a priority search algorithm to compute the $k$-locality of a graph. The algorithm is optimal in the number of marking prefix expansions, and is faster by orders of magnitude than an exhaustive search. Finally, we perform a case study on a DBLP subgraph to demonstrate the potential of $k$-locality for knowledge discovery.