🤖 AI Summary
This study investigates the instability of node embeddings in graph neural networks caused by training randomness, with a focus on how embedding dimensionality affects both stability and downstream task performance. The authors systematically evaluate five representative methods—ASNE, DGI, GraphSAGE, node2vec, and VERSE—across multiple datasets and dimensionalities. They reveal, for the first time, a non-monotonic relationship between embedding dimension and stability: while some methods (e.g., node2vec and ASNE) become more stable as dimensionality increases, others show no such trend. Crucially, peak stability often does not coincide with optimal task performance. These findings underscore the need for a balanced trade-off among stability, predictive performance, and computational efficiency, offering both theoretical insights and practical guidance for selecting embedding dimensions.
📝 Abstract
Previous work has established that neural network-based node embeddings return different outcomes when trained with identical parameters on the same dataset, just from using different training seeds. Yet, it has not been thoroughly analyzed how key hyperparameters such as embedding dimension could impact this instability. In this work, we investigate how varying the dimensionality of node embeddings influences both their stability and downstream performance. We systematically evaluate five widely used methods -- ASNE, DGI, GraphSAGE, node2vec, and VERSE -- across multiple datasets and embedding dimensions. We assess stability from both a representational perspective and a functional perspective, alongside performance evaluation. Our results show that embedding stability varies significantly with dimensionality, but we observe different patterns across the methods we consider: while some approaches, such as node2vec and ASNE, tend to become more stable with higher dimensionality, other methods do not exhibit the same trend. Moreover, we find that maximum stability does not necessarily align with optimal task performance. These findings highlight the importance of carefully selecting embedding dimension, and provide new insights into the trade-offs between stability, performance, and computational effectiveness in graph representation learning.