GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation

📅 2024-12-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address high inference latency and poor scalability of Graph Neural Networks (GNNs), this paper proposes TINED, an inter-layer knowledge distillation framework that efficiently compresses GNNs into Multi-Layer Perceptrons (MLPs). Its key contributions are: (1) a novel teacher-injection mechanism enabling parameter transfer from GNN feature transformation layers to MLP fully connected layers; (2) Dirichlet-energy-ratio-based smoothness-effect distillation, the first method to explicitly model the adversarial smoothness relationship between feature transformation (FT) and graph propagation (GP); and (3) a layer-aligned architecture with a theoretically derived error bound. Evaluated on seven benchmark datasets, TINED-distilled MLPs match or exceed the accuracy of their teacher GNNs while drastically reducing inference latency—outperforming all existing GNN distillation approaches in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) are fundamental to graph-based learning and excel in node classification tasks. However, GNNs suffer from scalability issues due to the need for multi-hop data during inference, limiting their use in latency-sensitive applications. Recent studies attempt to distill GNNs into multi-layer perceptrons (MLPs) for faster inference. They typically treat GNN and MLP models as single units for distillation, insufficiently utilizing the fine-grained knowledge within GNN layers. In this paper, we propose TINED, a novel method that distills GNNs to MLPs layer-wise through Teacher Injection with fine-tuning and Dirichlet Energy Distillation techniques. We analyze key operations in GNN layers, feature transformation (FT) and graph propagation (GP), and identify that an FT performs the same computation as a fully-connected (FC) layer in MLPs. Thus, we propose directly injecting valuable teacher parameters of an FT in a GNN into an FC layer of the student MLP, assisted by fine-tuning. In TINED, FC layers in an MLP mirror the order of the corresponding FTs and GPs in GNN. We provide a theoretical bound on the approximation of GPs. Moreover, we observe that in a GNN layer, FT and GP operations often have opposing smoothing effects: GP is aggressive, while FT is conservative, in smoothing. Using Dirichlet energy, we design a DE ratio to quantify these smoothing effects and propose Dirichlet Energy Distillation to distill these characteristics from GNN layers to MLP layers. Extensive experiments demonstrate that TINED achieves superior performance over GNNs and state-of-the-art distillation methods under various settings across seven datasets. The code is in supplementary material.

Problem

Research questions and friction points this paper is trying to address.

Distilling GNNs into MLPs for faster inference

Addressing underutilization of GNN layer-level insights

Balancing feature transformation and graph propagation effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-by-layer GNN to MLP distillation

Teacher Injection for parameter transfer

Dirichlet Energy Distillation for smoothing effects

🔎 Similar Papers

No similar papers found.