🤖 AI Summary
Graph neural networks (GNNs) struggle to model featureless graphs—such as social and biological networks—while existing attribute encoders exhibit poor adaptability to scale-free topologies and heterogeneous numerical graph metrics.
Method: We propose PropEnc, a universal property encoder that introduces a novel differentiable encoding paradigm based on histogram binning and inverted indexing. It generates low-dimensional, dense, order-preserving embeddings for arbitrary continuous or discrete graph metrics (e.g., degree, centrality), eliminating high-dimensional sparse representations and achieving over 90% dimensionality reduction. PropEnc initializes node representations via graph metric-driven encoding and refines them through end-to-end learnable mappings.
Contribution/Results: Evaluated on multiple featureless social network graph classification benchmarks, PropEnc consistently outperforms baselines by 3–8% in accuracy, demonstrating superior efficiency, generalizability, and robustness across diverse scale-free and heterogeneous graph structures.
📝 Abstract
Graph machine learning, particularly using graph neural networks, fundamentally relies on node features. Nevertheless, numerous real-world systems, such as social and biological networks, often lack node features due to various reasons, including privacy concerns, incomplete or missing data, and limitations in data collection. In such scenarios, researchers typically resort to methods like structural and positional encoding to construct node features. However, the length of such features is contingent on the maximum value within the property being encoded, for example, the highest node degree, which can be exceedingly large in applications like scale-free networks. Furthermore, these encoding schemes are limited to categorical data and might not be able to encode metrics returning other type of values. In this paper, we introduce a novel, universally applicable encoder, termed emph{PropEnc}, which constructs expressive node embedding from any given graph metric. emph{PropEnc} leverages histogram construction combined with reversed index encoding, offering a flexible method for node features initialization. It supports flexible encoding in terms of both dimensionality and type of input, demonstrating its effectiveness across diverse applications. emph{PropEnc} allows encoding metrics in low-dimensional space which effectively address the sparsity challenge and enhances the efficiency of the models. We show that emph{PropEnc} can construct node features that either exactly replicate one-hot encoding or closely approximate indices under various settings. Our extensive evaluations in graph classification setting across multiple social networks that lack node features support our hypothesis. The empirical results conclusively demonstrate that emph{PropEnc} is both an efficient and effective mechanism for constructing node features from diverse set of graph metrics.