Efficient Mixed Precision Quantization in Graph Neural Networks

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the trade-off between high computational overhead and accuracy degradation in GNN inference, this paper proposes MixQ-GNN, a mixed-precision quantization framework. Methodologically, it establishes, for the first time, a numerical equivalence theorem for quantized message aggregation—ensuring mathematical consistency of message passing in the integer domain. It supports configurable bit-width quantization across all GNN components (message passing, aggregation, and update), and introduces a graph-structure-aware adaptive search strategy to optimize multi-component bit-width combinations. The framework is compatible with mainstream GNN quantization approaches and requires no retraining. Experiments on node and graph classification tasks demonstrate that, compared to FP32, MixQ-GNN reduces bit operations by 5.5× and 5.1×, respectively, while preserving predictive accuracy with an average degradation of less than 0.3%. This yields substantial efficiency gains for large-scale graph inference.

Technology Category

Application Category

📝 Abstract
Graph Neural Networks (GNNs) have become essential for handling large-scale graph applications. However, the computational demands of GNNs necessitate the development of efficient methods to accelerate inference. Mixed precision quantization emerges as a promising solution to enhance the efficiency of GNN architectures without compromising prediction performance. Compared to conventional deep learning architectures, GNN layers contain a wider set of components that can be quantized, including message passing functions, aggregation functions, update functions, the inputs, learnable parameters, and outputs of these functions. In this paper, we introduce a theorem for efficient quantized message passing to aggregate integer messages. It guarantees numerical equality of the aggregated messages using integer values with respect to those obtained with full (FP32) precision. Based on this theorem, we introduce the Mixed Precision Quantization for GNN (MixQ-GNN) framework, which flexibly selects effective integer bit-widths for all components within GNN layers. Our approach systematically navigates the wide set of possible bit-width combinations, addressing the challenge of optimizing efficiency while aiming at maintaining comparable prediction performance. MixQ-GNN integrates with existing GNN quantization methods, utilizing their graph structure advantages to achieve higher prediction performance. On average, MixQ-GNN achieved reductions in bit operations of 5.5x for node classification and 5.1x for graph classification compared to architectures represented in FP32 precision.
Problem

Research questions and friction points this paper is trying to address.

Efficient mixed precision quantization for GNNs
Optimizing bit-widths for GNN components
Maintaining performance while reducing computational costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient quantized message passing with integer aggregation
Flexible bit-width selection for GNN components
Integration with existing GNN quantization methods