🤖 AI Summary
Current target-specific drug discovery faces challenges in integrating heterogeneous molecular representations and lacks a unified multimodal generative framework. To address this, we propose TextOmics—the first benchmark unifying omics expression profiles with molecular textual descriptions—and introduce ToDi, a novel generative framework. ToDi employs dual encoders (OmicsEn and TextEn) to align biological and semantic information, coupled with a conditional diffusion model (DiffGen) for controllable molecule generation. Notably, ToDi enables the first omics-text joint-guided zero-shot drug discovery. In hit-molecule generation tasks, it significantly outperforms state-of-the-art methods while simultaneously ensuring target specificity and therapeutic potential orientation. By bridging omics-scale biology with language-based molecular semantics, ToDi establishes a scalable, interpretable paradigm for data-driven early-stage drug design.
📝 Abstract
Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions. TextOmics provides a heterogeneous dataset that facilitates molecular generation through representations alignment. Built upon this foundation, we propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules. ToDi leverages two encoders (OmicsEn and TextEn) to capture multi-level biological and semantic associations, and develops conditional diffusion (DiffGen) for controllable generation. Extensive experiments confirm the effectiveness of TextOmics and demonstrate ToDi outperforms existing state-of-the-art approaches, while also showcasing remarkable potential in zero-shot therapeutic molecular generation. Sources are available at: https://github.com/hala-ToDi.