Auxiliary Discrminator Sequence Generative Adversarial Networks (ADSeqGAN) for Few Sample Molecule Generation

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional generative models for small-sample molecular generation—such as nucleic acid binders and CNS drugs—suffer from insufficient class specificity and drug-likeness due to data scarcity. Method: We propose ADSeqGAN, a novel sequence-based GAN architecture integrating a pretrained generator, Wasserstein distance optimization, and a random forest–based auxiliary discriminator—the first incorporation of an interpretable tree model into GAN discrimination—to jointly enhance training stability, chemical diversity, and target specificity. Built upon SMILES sequence modeling, ADSeqGAN incorporates molecular docking simulations and multi-dimensional property evaluation. Results: Experiments demonstrate superior performance over SeqGAN, ORGAN, and MolGPT in generating nucleic acid–targeting molecules; on the CNS drug dataset, lightweight oversampling boosts generation success rate significantly; generated compounds exhibit high target binding affinity, favorable synthetic accessibility, and structural diversity.

Technology Category

Application Category

📝 Abstract
In this work, we introduce Auxiliary Discriminator Sequence Generative Adversarial Networks (ADSeqGAN), a novel approach for molecular generation in small-sample datasets. Traditional generative models often struggle with limited training data, particularly in drug discovery, where molecular datasets for specific therapeutic targets, such as nucleic acids binders and central nervous system (CNS) drugs, are scarce. ADSeqGAN addresses this challenge by integrating an auxiliary random forest classifier as an additional discriminator into the GAN framework, significantly improves molecular generation quality and class specificity. Our method incorporates pretrained generator and Wasserstein distance to enhance training stability and diversity. We evaluate ADSeqGAN on a dataset comprising nucleic acid-targeting and protein-targeting small molecules, demonstrating its superior ability to generate nucleic acid binders compared to baseline models such as SeqGAN, ORGAN, and MolGPT. Through an oversampling strategy, ADSeqGAN also significantly improves CNS drug generation, achieving a higher yield than traditional de novo models. Critical assessments, including docking simulations and molecular property analysis, confirm that ADSeqGAN-generated molecules exhibit strong binding affinities, enhanced chemical diversity, and improved synthetic feasibility. Overall, ADSeqGAN presents a novel framework for generative molecular design in data-scarce scenarios, offering potential applications in computational drug discovery. We have demonstrated the successful applications of ADSeqGAN in generating synthetic nucleic acid-targeting and CNS drugs in this work.
Problem

Research questions and friction points this paper is trying to address.

Improves molecular generation with limited data
Enhances class specificity in drug discovery
Generates nucleic acid binders and CNS drugs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates auxiliary random forest classifier
Uses pretrained generator for stability
Employs Wasserstein distance for diversity
🔎 Similar Papers
No similar papers found.