🤖 AI Summary
Existing molecular relationship learning (MRL) methods rely solely on 2D topological graphs and thus fail to capture critical 3D interaction geometries. Method: We propose the first geometry-aware pretraining paradigm grounded in a virtual 3D molecular interaction environment. Our approach constructs a differentiable 3D interaction simulator and jointly optimizes contrastive learning with physics-inspired force prediction—enabling 2D graph neural networks to implicitly learn 3D interaction geometry. Contribution/Results: We innovatively integrate force prediction into the contrastive framework to synergistically model coarse-grained geometric awareness and fine-grained physical constraints. Furthermore, we introduce geometric knowledge distillation to transfer 3D-aware representations into efficient 2D models. Evaluated across 40 real-world tasks—including out-of-distribution and extrapolation scenarios—our method achieves an average improvement of 24.93%, significantly enhancing generalization to unseen molecular interaction patterns.
📝 Abstract
Molecular Relational Learning (MRL) is a rapidly growing field that focuses on understanding the interaction dynamics between molecules, which is crucial for applications ranging from catalyst engineering to drug discovery. Despite recent progress, earlier MRL approaches are limited to using only the 2D topological structure of molecules, as obtaining the 3D interaction geometry remains prohibitively expensive. This paper introduces a novel 3D geometric pre-training strategy for MRL (3DMRL) that incorporates a 3D virtual interaction environment, overcoming the limitations of costly traditional quantum mechanical calculation methods. With the constructed 3D virtual interaction environment, 3DMRL trains 2D MRL model to learn the overall 3D geometric information of molecular interaction through contrastive learning. Moreover, fine-grained interaction between molecules is learned through force prediction loss, which is crucial in understanding the wide range of molecular interaction processes. Extensive experiments on various tasks using real-world datasets, including out-of-distribution and extrapolation scenarios, demonstrate the effectiveness of 3DMRL, showing up to a 24.93% improvement in performance across 40 tasks.