🤖 AI Summary
Graph neural networks (GNNs) achieve strong performance in molecular property prediction, yet accuracy gains often incur substantial computational and memory overhead. To address this trade-off, we propose Layer-wise Knowledge Mixing (LKM), a self-knowledge distillation framework that minimizes embedding distance among hidden representations across layers, enabling efficient integration of multi-hop and multi-scale structural information without increasing inference cost. LKM is architecture-agnostic—compatible with DimeNet++, MXMNet, PAMNet, and others—and leverages pretrained layer embeddings for optimization. Evaluated on QM9, MD17, and Chignolin, LKM reduces mean prediction error by 9.8%, 45.3%, and 22.9%, respectively, significantly improving both generalization and accuracy. This work establishes a new paradigm for lightweight, high-fidelity molecular modeling.
📝 Abstract
Graph Neural Networks (GNNs) are the currently most effective methods for predicting molecular properties but there remains a need for more accurate models. GNN accuracy can be improved by increasing the model complexity but this also increases the computational cost and memory requirement during training and inference. In this study, we develop Layer-to-Layer Knowledge Mixing (LKM), a novel self-knowledge distillation method that increases the accuracy of state-of-the-art GNNs while adding negligible computational complexity during training and inference. By minimizing the mean absolute distance between pre-existing hidden embeddings of GNN layers, LKM efficiently aggregates multi-hop and multi-scale information, enabling improved representation of both local and global molecular features. We evaluated LKM using three diverse GNN architectures (DimeNet++, MXMNet, and PAMNet) using datasets of quantum chemical properties (QM9, MD17 and Chignolin). We found that the LKM method effectively reduces the mean absolute error of quantum chemical and biophysical property predictions by up to 9.8% (QM9), 45.3% (MD17 Energy), and 22.9% (Chignolin). This work demonstrates the potential of LKM to significantly improve the accuracy of GNNs for chemical property prediction without any substantial increase in training and inference cost.