TextOmics-Guided Diffusion for Hit-like Molecular Generation

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current target-specific drug discovery faces challenges in integrating heterogeneous molecular representations and lacks a unified multimodal generative framework. To address this, we propose TextOmics—the first benchmark unifying omics expression profiles with molecular textual descriptions—and introduce ToDi, a novel generative framework. ToDi employs dual encoders (OmicsEn and TextEn) to align biological and semantic information, coupled with a conditional diffusion model (DiffGen) for controllable molecule generation. Notably, ToDi enables the first omics-text joint-guided zero-shot drug discovery. In hit-molecule generation tasks, it significantly outperforms state-of-the-art methods while simultaneously ensuring target specificity and therapeutic potential orientation. By bridging omics-scale biology with language-based molecular semantics, ToDi establishes a scalable, interpretable paradigm for data-driven early-stage drug design.

Technology Category

Application Category

📝 Abstract
Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions. TextOmics provides a heterogeneous dataset that facilitates molecular generation through representations alignment. Built upon this foundation, we propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules. ToDi leverages two encoders (OmicsEn and TextEn) to capture multi-level biological and semantic associations, and develops conditional diffusion (DiffGen) for controllable generation. Extensive experiments confirm the effectiveness of TextOmics and demonstrate ToDi outperforms existing state-of-the-art approaches, while also showcasing remarkable potential in zero-shot therapeutic molecular generation. Sources are available at: https://github.com/hala-ToDi.
Problem

Research questions and friction points this paper is trying to address.

Generating hit-like molecules with therapeutic potential
Lacking heterogeneous data and unified frameworks
Aligning omics expressions with molecular textual descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

TextOmics aligns omics and molecular textual data
ToDi uses dual encoders for biological associations
DiffGen enables controllable molecular generation
🔎 Similar Papers
No similar papers found.