🤖 AI Summary
This paper addresses the challenge of graph neural network (GNN) model fusion in the absence of original training data. We systematically identify and characterize the failure mechanisms of existing model merging techniques when applied to GNNs—marking the first such analysis. To overcome these limitations, we propose a task-agnostic embedding alignment framework. Its core innovation lies in deriving an analytical solution for model weights under relaxed conditions, achieved by aligning node embedding spaces—thereby eliminating the need for shared initialization, data replay, or auxiliary optimization. The method ensures strong scalability and theoretical interpretability. Extensive experiments across multiple datasets, downstream tasks, and GNN architectures demonstrate that our approach improves accuracy by up to 24% over state-of-the-art methods, while accelerating inference by two orders of magnitude compared to training from scratch.
📝 Abstract
Model merging has gained prominence in machine learning as a method to integrate multiple trained models into a single model without accessing the original training data. While existing approaches have demonstrated success in domains such as computer vision and NLP, their application to Graph Neural Networks (GNNs) remains unexplored. These methods often rely on the assumption of shared initialization, which is seldom applicable to GNNs. In this work, we undertake the first benchmarking study of model merging algorithms for GNNs, revealing their limited effectiveness in this context. To address these challenges, we propose GNNMerge, which utilizes a task-agnostic node embedding alignment strategy to merge GNNs. Furthermore, we establish that under a mild relaxation, the proposed optimization objective admits direct analytical solutions for widely used GNN architectures, significantly enhancing its computational efficiency. Empirical evaluations across diverse datasets, tasks, and architectures establish GNNMerge to be up to 24% more accurate than existing methods while delivering over 2 orders of magnitude speed-up compared to training from scratch.