Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal molecular modeling faces two key challenges: unreliable 3D conformations and modality collapse, undermining model robustness and generalization. To address these, we propose MuMo—a novel framework featuring a structured fusion pipeline and a progressive cross-modal injection mechanism. MuMo preserves the independence of 2D topological and 3D geometric modalities while enabling their efficient synergy. It employs a state-space model backbone to establish a unified 2D–3D joint prior and adopts an asymmetric fusion strategy to dynamically inject 3D geometric information into the sequence stream. Evaluated on 29 molecular property prediction benchmarks, MuMo achieves an average 2.7% improvement over SOTA baselines, ranking first on 22 tasks. Notably, it delivers a 27% performance gain on the noise-sensitive LD50 task, demonstrating superior robustness to 3D conformational perturbations and validating the efficacy of its multimodal fusion design.

Technology Category

Application Category

📝 Abstract
Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization. We propose MuMo, a structured multimodal fusion framework that addresses these challenges in molecular representation through two key strategies. To reduce the instability of conformer-dependent fusion, we design a Structured Fusion Pipeline (SFP) that combines 2D topology and 3D geometry into a unified and stable structural prior. To mitigate modality collapse caused by naive fusion, we introduce a Progressive Injection (PI) mechanism that asymmetrically integrates this prior into the sequence stream, preserving modality-specific modeling while enabling cross-modal enrichment. Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation. Across 29 benchmark tasks from Therapeutics Data Commons (TDC) and MoleculeNet, MuMo achieves an average improvement of 2.7% over the best-performing baseline on each task, ranking first on 22 of them, including a 27% improvement on the LD50 task. These results validate its robustness to 3D conformer noise and the effectiveness of multimodal fusion in molecular representation. The code is available at: github.com/selmiss/MuMo.
Problem

Research questions and friction points this paper is trying to address.

Addressing 3D conformer unreliability in multimodal molecular learning
Mitigating modality collapse through structured fusion strategies
Improving molecular representation robustness and generalization capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured fusion pipeline combining 2D and 3D molecular data
Progressive injection mechanism for asymmetric multimodal integration
State space backbone enabling long-range dependency modeling
🔎 Similar Papers
No similar papers found.
Z
Zihao Jing
Department of Computer Science, Western University, London, ON, Canada
Y
Yan Sun
Department of Computer Science, Western University, London, ON, Canada
Y
Yan Yi Li
Department of Biochemistry, Western University, London, ON, Canada
S
Sugitha Janarthanan
Department of Biochemistry, Western University, London, ON, Canada
A
Alana Deng
Department of Computer Science, Western University, London, ON, Canada
Pingzhao Hu
Pingzhao Hu
Canada Research Chair, Associate Prof, Western University, Associate Prof., Univ. of Toronto
BioinformaticsStatistic GeneticsDeep LearningHealth Data ScienceMedical Imaging