DMMRL: Disentangled Multi-Modal Representation Learning via Variational Autoencoders for Molecular Property Prediction

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the limitations of existing molecular property prediction methods, which often produce entangled representations that fail to disentangle structural, chemical, and functional factors and inadequately integrate multimodal information. To overcome these challenges, the authors propose a variational autoencoder-based disentangled representation learning framework that decomposes the molecular latent space into shared (structure-related) and private (modality-specific) subspaces. Orthogonality and alignment regularizations are introduced to enhance disentanglement, while a gated attention mechanism enables effective fusion of graph, sequence, and geometric modalities. Evaluated on seven benchmark datasets, the proposed method significantly outperforms current state-of-the-art models, achieving both improved predictive performance and enhanced interpretability of learned representations.

Technology Category

Application Category

📝 Abstract

Molecular property prediction constitutes a cornerstone of drug discovery and materials science, necessitating models capable of disentangling complex structure-property relationships across diverse molecular modalities. Existing approaches frequently exhibit entangled representations--conflating structural, chemical, and functional factors--thereby limiting interpretability and transferability. Furthermore, conventional methods inadequately exploit complementary information from graphs, sequences, and geometries, often relying on naive concatenation that neglects inter-modal dependencies. In this work, we propose DMMRL, which employs variational autoencoders to disentangle molecular representations into shared (structure-relevant) and private (modality-specific) latent spaces, enhancing both interpretability and predictive performance. The proposed variational disentanglement mechanism effectively isolates the most informative features for property prediction, while orthogonality and alignment regularizations promote statistical independence and cross-modal consistency. Additionally, a gated attention fusion module adaptively integrates shared representations, capturing complex inter-modal relationships. Experimental validation across seven benchmark datasets demonstrates DMMRL's superior performance relative to state-of-the-art approaches. The code and data underlying this article are freely available at https://github.com/xulong0826/DMMRL.

Problem

Research questions and friction points this paper is trying to address.

molecular property prediction

disentangled representation

multi-modal learning

interpretability

transferability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disentangled Representation

Variational Autoencoder

Multi-Modal Fusion