TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing molecular representation learning methods predominantly rely on SMILES strings while neglecting complementary textual descriptions and taxonomic annotations, leading to semantically impoverished representations. To address this, we propose the first trimodal molecular representation learning framework that jointly models SMILES, natural-language descriptions, and functional taxonomy annotations. Our method introduces two novel alignment mechanisms: (i) a global “volume alignment” objective that enforces distributional consistency across modality-specific embedding spaces, and (ii) a local “substructure–subtext” fine-grained alignment that bridges molecular substructures with corresponding descriptive phrases. A momentum-balancing strategy enables joint optimization of global and local objectives. The architecture employs three dedicated encoders and performs end-to-end, hierarchical functional semantic learning. Extensive evaluation across 11 molecular property prediction tasks demonstrates consistent and significant improvements over state-of-the-art methods, validating the critical role of integrating heterogeneous semantic information for robust molecular representation.

Technology Category

Application Category

📝 Abstract

Molecular property prediction aims to learn representations that map chemical structures to functional properties. While multimodal learning has emerged as a powerful paradigm to learn molecular representations, prior works have largely overlooked textual and taxonomic information of molecules for representation learning. We introduce TRIDENT, a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations. To achieve this, we curate a comprehensive dataset of molecule-text pairs with structured, multi-level functional annotations. Instead of relying on conventional contrastive loss, TRIDENT employs a volume-based alignment objective to jointly align tri-modal features at the global level, enabling soft, geometry-aware alignment across modalities. Additionally, TRIDENT introduces a novel local alignment objective that captures detailed relationships between molecular substructures and their corresponding sub-textual descriptions. A momentum-based mechanism dynamically balances global and local alignment, enabling the model to learn both broad functional semantics and fine-grained structure-function mappings. TRIDENT achieves state-of-the-art performance on 11 downstream tasks, demonstrating the value of combining SMILES, textual, and taxonomic functional annotations for molecular property prediction.

Problem

Research questions and friction points this paper is trying to address.

Integrates SMILES, text, taxonomy for molecular representation

Improves multimodal alignment with geometry-aware and local objectives

Enhances molecular property prediction across 11 tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates SMILES, text, and taxonomic annotations

Uses volume-based alignment for tri-modal features

Introduces local alignment for substructure-text relations

🔎 Similar Papers

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization