GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing molecular pretraining methods primarily focus on node-level denoising, failing to effectively capture global geometric structure and thus limiting performance on graph-level property prediction (e.g., energy and force-field regression). To address this, we propose the first geometry-reconstruction-based self-supervised pretraining framework tailored for 3D molecular graph-level representation learning. Our approach treats each molecule as a holistic entity and jointly optimizes 3D atomic coordinate reconstruction and graph representation decoding within a GNN architecture, enabling end-to-end learning of long-range spatial dependencies. Crucially, it requires no external annotations or auxiliary data and achieves molecular-level generative geometric modeling for the first time. On standard benchmarks—including QM9 and MD17—our method significantly outperforms node-level denoising baselines, reducing mean absolute error by 12–19% on energy prediction and force-field regression tasks.

Technology Category

Application Category

📝 Abstract
The pretraining-and-finetuning paradigm has driven significant advances across domains, such as natural language processing and computer vision, with representative pretraining paradigms such as masked language modeling and next-token prediction. However, in molecular representation learning, the task design remains largely limited to node-level denoising, which is effective at modeling local atomic environments, yet maybe insufficient for capturing the global molecular structure required by graph-level property prediction tasks, such as energy estimation and molecular regression. In this work, we present GeoRecon, a novel graph-level pretraining framework that shifts the focus from individual atoms to the molecule as an integrated whole. GeoRecon introduces a graph-level reconstruction task: during pretraining, the model is trained to generate an informative graph representation capable of accurately guiding reconstruction of the molecular geometry. This encourages the model to learn coherent, global structural features rather than isolated atomic details. Without relying on additional supervision or external data, GeoRecon outperforms node-centric baselines on multiple molecular benchmarks (e.g., QM9, MD17), demonstrating the benefit of incorporating graph-level reconstruction for learning more holistic and geometry-aware molecular embeddings.
Problem

Research questions and friction points this paper is trying to address.

Develops graph-level pretraining for 3D molecular representation learning
Shifts focus from atomic-level to global molecular structure modeling
Improves performance on geometry-aware tasks without extra supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-level reconstruction task for molecular geometry
Global structural feature learning without local atomic focus
Geometry-aware embeddings outperform node-centric baselines
🔎 Similar Papers
No similar papers found.
S
Shaoheng Yan
Institute for Artificial Intelligence, Peking University; Yuanpei College, Peking University
Zian Li
Zian Li
Peking University
Graph Neural Networks
Muhan Zhang
Muhan Zhang
Peking University
Machine LearningGraph Neural NetworkLarge Language Models