DMMRL: Disentangled Multi-Modal Representation Learning via Variational Autoencoders for Molecular Property Prediction

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing molecular property prediction methods, which often produce entangled representations that fail to disentangle structural, chemical, and functional factors and inadequately integrate multimodal information. To overcome these challenges, the authors propose a variational autoencoder-based disentangled representation learning framework that decomposes the molecular latent space into shared (structure-related) and private (modality-specific) subspaces. Orthogonality and alignment regularizations are introduced to enhance disentanglement, while a gated attention mechanism enables effective fusion of graph, sequence, and geometric modalities. Evaluated on seven benchmark datasets, the proposed method significantly outperforms current state-of-the-art models, achieving both improved predictive performance and enhanced interpretability of learned representations.

Technology Category

Application Category

📝 Abstract
Molecular property prediction constitutes a cornerstone of drug discovery and materials science, necessitating models capable of disentangling complex structure-property relationships across diverse molecular modalities. Existing approaches frequently exhibit entangled representations--conflating structural, chemical, and functional factors--thereby limiting interpretability and transferability. Furthermore, conventional methods inadequately exploit complementary information from graphs, sequences, and geometries, often relying on naive concatenation that neglects inter-modal dependencies. In this work, we propose DMMRL, which employs variational autoencoders to disentangle molecular representations into shared (structure-relevant) and private (modality-specific) latent spaces, enhancing both interpretability and predictive performance. The proposed variational disentanglement mechanism effectively isolates the most informative features for property prediction, while orthogonality and alignment regularizations promote statistical independence and cross-modal consistency. Additionally, a gated attention fusion module adaptively integrates shared representations, capturing complex inter-modal relationships. Experimental validation across seven benchmark datasets demonstrates DMMRL's superior performance relative to state-of-the-art approaches. The code and data underlying this article are freely available at https://github.com/xulong0826/DMMRL.
Problem

Research questions and friction points this paper is trying to address.

molecular property prediction
disentangled representation
multi-modal learning
interpretability
transferability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled Representation
Variational Autoencoder
Multi-Modal Fusion
Molecular Property Prediction
Gated Attention
🔎 Similar Papers
No similar papers found.
Long Xu
Long Xu
Ningbo University, Peng Cheng Laboratory
image/signal processingvideo codingespecially rate control of video codingimage/signal
J
Junping Guo
Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, China
J
Jianbo Zhao
Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, China
J
Jianbo Lu
Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Nanning Normal University, Nanning, China
Y
Yuzhong Peng
College of Big Data and Software Engineering, Zhejiang Wanli University, Ningbo, China