🤖 AI Summary
To address the challenge of directly generating phenotype-targeting molecules from gene expression profiles in drug discovery, this study proposes the first biology-response-driven end-to-end molecular generation paradigm. Unlike conventional approaches relying on chemical priors (e.g., SMILES syntax or structural constraints), our method innovatively incorporates gene expression signatures as learnable generative constraints. Specifically, we integrate a variational autoencoder (VAE) to model latent representations of transcriptomic profiles and couple it with an LSTM decoder to generate chemically valid, drug-like SMILES structures. Experimental results demonstrate that the generated molecules exhibit favorable drug-likeness and synthetic accessibility, and—critically—transcriptomic validation confirms their intended regulatory effects on target proteins. This work establishes an interpretable and generalizable generative framework for phenotype-driven de novo drug design.
📝 Abstract
De novo generation of hit-like molecules is a challenging task in the drug discovery process. Most methods in previous studies learn the semantics and syntax of molecular structures by analyzing molecular graphs or simplified molecular input line entry system (SMILES) strings; however, they do not take into account the drug responses of the biological systems consisting of genes and proteins. In this study we propose a deep generative model, Gx2Mol, which utilizes gene expression profiles to generate molecular structures with desirable phenotypes for arbitrary target proteins. In the algorithm, a variational autoencoder is employed as a feature extractor to learn the latent feature distribution of the gene expression profiles. Then, a long short-term memory is leveraged as the chemical generator to produce syntactically valid SMILES strings that satisfy the feature conditions of the gene expression profile extracted by the feature extractor. Experimental results and case studies demonstrate that the proposed Gx2Mol model can produce new molecules with potential bioactivities and drug-like properties.