TextOmics-Guided Diffusion for Hit-like Molecular Generation

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
Current target-specific drug discovery faces challenges in integrating heterogeneous molecular representations and lacks a unified multimodal generative framework. To address this, we propose TextOmics—the first benchmark unifying omics expression profiles with molecular textual descriptions—and introduce ToDi, a novel generative framework. ToDi employs dual encoders (OmicsEn and TextEn) to align biological and semantic information, coupled with a conditional diffusion model (DiffGen) for controllable molecule generation. Notably, ToDi enables the first omics-text joint-guided zero-shot drug discovery. In hit-molecule generation tasks, it significantly outperforms state-of-the-art methods while simultaneously ensuring target specificity and therapeutic potential orientation. By bridging omics-scale biology with language-based molecular semantics, ToDi establishes a scalable, interpretable paradigm for data-driven early-stage drug design.

Technology Category

Application Category

📝 Abstract
Hit-like molecular generation with therapeutic potential is essential for target-specific drug discovery. However, the field lacks heterogeneous data and unified frameworks for integrating diverse molecular representations. To bridge this gap, we introduce TextOmics, a pioneering benchmark that establishes one-to-one correspondences between omics expressions and molecular textual descriptions. TextOmics provides a heterogeneous dataset that facilitates molecular generation through representations alignment. Built upon this foundation, we propose ToDi, a generative framework that jointly conditions on omics expressions and molecular textual descriptions to produce biologically relevant, chemically valid, hit-like molecules. ToDi leverages two encoders (OmicsEn and TextEn) to capture multi-level biological and semantic associations, and develops conditional diffusion (DiffGen) for controllable generation. Extensive experiments confirm the effectiveness of TextOmics and demonstrate ToDi outperforms existing state-of-the-art approaches, while also showcasing remarkable potential in zero-shot therapeutic molecular generation. Sources are available at: https://github.com/hala-ToDi.
Problem

Research questions and friction points this paper is trying to address.

Generating hit-like molecules with therapeutic potential
Lacking heterogeneous data and unified frameworks
Aligning omics expressions with molecular textual descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

TextOmics aligns omics and molecular textual data
ToDi uses dual encoders for biological associations
DiffGen enables controllable molecular generation
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid