Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

📅 2024-04-23

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing molecular-text cross-modal methods rely on global alignment, failing to capture fine-grained correspondences between molecular substructures (e.g., atoms or functional groups) and descriptive textual phrases, while being hindered by the scarcity of localized pairwise annotations. To address this, we propose a Hierarchical Adaptive Alignment (HAA) model that jointly aligns SMILES strings and text at three granularities—atomic, functional-group, and molecular levels. We further introduce the first end-to-end understanding-generation framework integrating a multimodal Transformer encoder, hierarchical attention mechanisms, contrastive learning, and generative pretraining (molecular captioning and SMILES generation). On retrieval tasks, our method achieves an average 30.8% improvement in Recall@1; it also establishes new state-of-the-art performance on both captioning and SMILES generation tasks. Visualization analyses confirm its chemical interpretability and fidelity to domain knowledge.

Technology Category

Application Category

📝 Abstract

Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to capture fine-grained information, such as molecular fragments and their corresponding textual description, which is crucial for downstream tasks. Furthermore, it is incapable to model such information using a similar global alignment strategy due to data scarcity of paired local part annotated data from existing datasets. In this paper, we propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. We design a Hierarchical Adaptive Alignment model to concurrently learn the fine-grained fragment correspondence between two modalities and align these representations of fragments in three levels. Additionally, Atomas's end-to-end training framework incorporates the tasks of understanding and generating molecule, thereby supporting a wider range of downstream tasks. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task. Moreover, the visualization of the Hierarchical Adaptive Alignment model further confirms the chemical significance of our approach. Our codes can be found at https://anonymous.4open.science/r/Atomas-03C3.

Problem

Research questions and friction points this paper is trying to address.

Enhances molecular representation quality for drug discovery and materials science.

Addresses limitations of global alignment in capturing fine-grained molecular-text information.

Proposes a framework for joint learning from SMILES strings and text.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Adaptive Alignment model for fine-grained fragment correspondence

End-to-end training framework for molecule understanding and generation

State-of-the-art results in molecule captioning and generation tasks

🔎 Similar Papers

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization