๐ค AI Summary
Deploying graph neural networks (GNNs) on edge devices faces challenges including high parameter overhead, excessive computational cost, and structural information loss caused by conventional quantization methods that ignore graph topology. To address these issues, this paper proposes a node-aware dynamic quantization framework. It adaptively adjusts quantization scales per node based on the user-item interaction graph structure, dynamically refines quantization ranges via message passing, and introduces a graph-relational gradient estimation mechanism to enhance training stability. Under 2-bit low-precision quantization, the method achieves 8โ12ร model compression and 2ร faster training speed. Empirical results demonstrate average improvements of 27.8% and 17.6% in Recall@10 and NDCG@10, respectively, over state-of-the-art methodsโmatching the performance of full-precision models while significantly reducing resource requirements for edge deployment.
๐ Abstract
In the realm of collaborative filtering recommendation systems, Graph Neural Networks (GNNs) have demonstrated remarkable performance but face significant challenges in deployment on resource-constrained edge devices due to their high embedding parameter requirements and computational costs. Using common quantization method directly on node embeddings may overlooks their graph based structure, causing error accumulation during message passing and degrading the quality of quantized embeddings.To address this, we propose Graph based Node-Aware Dynamic Quantization training for collaborative filtering (GNAQ), a novel quantization approach that leverages graph structural information to enhance the balance between efficiency and accuracy of GNNs for Top-K recommendation. GNAQ introduces a node-aware dynamic quantization strategy that adapts quantization scales to individual node embeddings by incorporating graph interaction relationships. Specifically, it initializes quantization intervals based on node-wise feature distributions and dynamically refines them through message passing in GNN layers. This approach mitigates information loss caused by fixed quantization scales and captures hierarchical semantic features in user-item interaction graphs. Additionally, GNAQ employs graph relation-aware gradient estimation to replace traditional straight-through estimators, ensuring more accurate gradient propagation during training. Extensive experiments on four real-world datasets demonstrate that GNAQ outperforms state-of-the-art quantization methods, including BiGeaR and N2UQ, by achieving average improvement in 27.8% Recall@10 and 17.6% NDCG@10 under 2-bit quantization. In particular, GNAQ is capable of maintaining the performance of full-precision models while reducing their model sizes by 8 to 12 times; in addition, the training time is twice as fast compared to quantization baseline methods.