MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dual alignment challenge—structural and pharmacological—between protein targets and generated molecules in structure-based drug design. We propose a two-stage alignment framework: (1) a unified representation stage integrating an autoregressive language model with a diffusion-based structural encoder to jointly encode protein 3D structures and molecular sequences/conformations; and (2) an attribute-aware generation stage leveraging Direct Preference Optimization (DPO), guided by pharmacological property signals (e.g., ADMET). Built upon NatureLM as the molecular generative backbone and augmented with protein-structure diffusion encoding, our method achieves state-of-the-art performance on CrossDocked2020. It significantly improves both structural fidelity of generated molecules and consistency with desired ADMET profiles. This framework establishes a new paradigm for end-to-end, interpretable, and property-controllable structure-guided molecular generation.

Technology Category

Application Category

📝 Abstract
Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representations with molecular representations, and ensuring alignment between generated drugs and their pharmacological properties, remains a critical challenge. To address these challenges, we propose MolChord, which integrates two key techniques: (1) to align protein and molecule structures with their textual descriptions and sequential representations (e.g., FASTA for proteins and SMILES for molecules), we leverage NatureLM, an autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder; and (2) to guide molecules toward desired properties, we curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization (DPO). Experimental results on CrossDocked2020 demonstrate that our approach achieves state-of-the-art performance on key evaluation metrics, highlighting its potential as a practical tool for SBDD.
Problem

Research questions and friction points this paper is trying to address.

Aligning protein structures with molecular representations for drug design
Ensuring generated drugs match desired pharmacological properties
Integrating structural and sequential data for improved ligand generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

MolChord integrates autoregressive model with diffusion encoder
Aligns protein and molecule structures with sequential representations
Uses Direct Preference Optimization for property-aware drug generation