🤖 AI Summary
This work addresses the dual alignment challenge—structural and pharmacological—between protein targets and generated molecules in structure-based drug design. We propose a two-stage alignment framework: (1) a unified representation stage integrating an autoregressive language model with a diffusion-based structural encoder to jointly encode protein 3D structures and molecular sequences/conformations; and (2) an attribute-aware generation stage leveraging Direct Preference Optimization (DPO), guided by pharmacological property signals (e.g., ADMET). Built upon NatureLM as the molecular generative backbone and augmented with protein-structure diffusion encoding, our method achieves state-of-the-art performance on CrossDocked2020. It significantly improves both structural fidelity of generated molecules and consistency with desired ADMET profiles. This framework establishes a new paradigm for end-to-end, interpretable, and property-controllable structure-guided molecular generation.
📝 Abstract
Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representations with molecular representations, and ensuring alignment between generated drugs and their pharmacological properties, remains a critical challenge. To address these challenges, we propose MolChord, which integrates two key techniques: (1) to align protein and molecule structures with their textual descriptions and sequential representations (e.g., FASTA for proteins and SMILES for molecules), we leverage NatureLM, an autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder; and (2) to guide molecules toward desired properties, we curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization (DPO). Experimental results on CrossDocked2020 demonstrate that our approach achieves state-of-the-art performance on key evaluation metrics, highlighting its potential as a practical tool for SBDD.