Cross-Modality Controlled Molecule Generation with Diffusion Language Model

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenge that pretrained diffusion models struggle to support cross-modal constraints and cannot dynamically integrate new constraints without retraining. We propose a two-stage controllable generation framework that incorporates a Structure Control Module (SCM) and a Property Control Module (PCM) as plug-and-play components, injecting multimodal conditional signals at distinct stages of the diffusion process. To our knowledge, this is the first method enabling flexible adaptation and incremental integration of cross-modal constraints—such as structural motifs and physicochemical properties—into pretrained diffusion language models, without fine-tuning or retraining. Extensive evaluation on multiple molecular generation benchmarks demonstrates that generated molecules precisely satisfy both structural and property constraints, achieving significantly improved attribute matching rates. Moreover, the approach substantially enhances efficiency in complex multi-objective drug optimization tasks. Our framework establishes a new paradigm for AI-driven molecular design, offering high flexibility and strong generalizability.

Technology Category

Application Category

📝 Abstract

Current SMILES-based diffusion models for molecule generation typically support only unimodal constraint. They inject conditioning signals at the start of the training process and require retraining a new model from scratch whenever the constraint changes. However, real-world applications often involve multiple constraints across different modalities, and additional constraints may emerge over the course of a study. This raises a challenge: how to extend a pre-trained diffusion model not only to support cross-modality constraints but also to incorporate new ones without retraining. To tackle this problem, we propose the Cross-Modality Controlled Molecule Generation with Diffusion Language Model (CMCM-DLM), demonstrated by two distinct cross modalities: molecular structure and chemical properties. Our approach builds upon a pre-trained diffusion model, incorporating two trainable modules, the Structure Control Module (SCM) and the Property Control Module (PCM), and operates in two distinct phases during the generation process. In Phase I, we employs the SCM to inject structural constraints during the early diffusion steps, effectively anchoring the molecular backbone. Phase II builds on this by further introducing PCM to guide the later stages of inference to refine the generated molecules, ensuring their chemical properties match the specified targets. Experimental results on multiple datasets demonstrate the efficiency and adaptability of our approach, highlighting CMCM-DLM's significant advancement in molecular generation for drug discovery applications.

Problem

Research questions and friction points this paper is trying to address.

Extending pre-trained diffusion models to support cross-modality constraints

Incorporating new molecular constraints without model retraining

Generating molecules with multiple structural and property constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses trainable modules for cross-modality constraint injection

Operates in two-phase generation with structure then property control

Extends pretrained diffusion model without retraining for new constraints

🔎 Similar Papers

LDMol: A Text-to-Molecule Diffusion Model with Structurally Informative Latent Space Surpasses AR Models