🤖 AI Summary
This work addresses the modeling gap between three-dimensional molecular conformations and two-dimensional statistical representations by introducing the Suiren-1.0 family of molecular foundation models. Built upon an SE(3)-equivariant architecture, Suiren-1.0 integrates spatial self-supervised learning with large-scale pretraining on density functional theory data and features a novel Conformational Compression Distillation (CCD) framework—leveraging diffusion models for the first time to compress 3D structures into efficient 2D representations. The model encompasses monomers, dimers, and conformational ensembles, establishing a multimodal molecular representation system that achieves state-of-the-art performance across multiple molecular property prediction tasks. All models and evaluation benchmarks are publicly released.
📝 Abstract
We introduce Suiren-1.0, a family of molecular foundation models for the accurate modeling of diverse organic systems. Suiren-1.0 comprising three specialized variants (Suiren-Base, Suiren-Dimer, and Suiren-ConfAvg) is integrated within an algorithmic framework that bridges the gap between 3D conformational geometry and 2D statistical ensemble spaces. We first pre-train Suiren-Base (1.8B parameters) on a 70M-sample Density Functional Theory dataset using spatial self-supervision and SE(3)-equivariant architectures, achieving robust performance in quantum property prediction. Suiren-Dimer extends this capability through continued pre-training on 13.5M intermolecular interaction samples. To enable efficient downstream application, we propose Conformation Compression Distillation (CCD), a diffusion-based framework that distills complex 3D structural representations into 2D conformation-averaged representations. This yields the lightweight Suiren-ConfAvg, which generates high-fidelity representations from SMILES or molecular graphs. Our extensive evaluations demonstrate that Suiren-1.0 establishes state-of-the-art results across a range of tasks. All models and benchmarks are open-sourced.