Drug Discovery SMILES-to-Pharmacokinetics Diffusion Models with Deep Molecular Understanding

📅 2024-08-14
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Pharmacokinetic (PK) data across drug datasets exhibit sparse overlap, hindering multi-drug combination studies and high-throughput screening. To address this, we propose Imagand—the first SMILES-conditioned diffusion generative model for PK properties—integrating syntax-aware SMILES encoding with joint multi-property modeling to enable end-to-end, structure–property controllable generation. Our framework effectively mitigates the sparsity of real-world PK data distributions, achieving high fidelity in both univariate and bivariate property distributions. Generated synthetic data significantly enhance downstream PK prediction performance: MAE reductions of 12.6–28.3% are observed for clearance (CL), volume of distribution at steady state (VDss), and half-life (T₁/₂). The implementation is publicly available and has been adopted by the research community.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) is increasingly used in every stage of drug development. One challenge facing drug discovery AI is that drug pharmacokinetic (PK) datasets are often collected independently from each other, often with limited overlap, creating data overlap sparsity. Data sparsity makes data curation difficult for researchers looking to answer research questions in poly-pharmacy, drug combination research, and high-throughput screening. We propose Imagand, a novel SMILES-to-Pharmacokinetic (S2PK) diffusion model capable of generating an array of PK target properties conditioned on SMILES inputs. We show that Imagand-generated synthetic PK data closely resembles real data univariate and bivariate distributions, and improves performance for downstream tasks. Imagand is a promising solution for data overlap sparsity and allows researchers to efficiently generate ligand PK data for drug discovery research. Code is available at url{https://github.com/bing1100/Imagand}.
Problem

Research questions and friction points this paper is trying to address.

Addresses data sparsity in drug pharmacokinetic datasets
Generates synthetic PK properties from SMILES inputs
Improves downstream drug discovery research efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

SMILES-to-PK diffusion model for drug discovery
Generates synthetic PK data resembling real distributions
Improves downstream task performance with sparse data
🔎 Similar Papers
No similar papers found.