🤖 AI Summary
In low-data regimes, de novo drug molecule generation struggles to precisely navigate pharmacologically relevant chemical space. Method: This paper proposes VECTOR+, a unified framework integrating contrastive representation learning with property-guided controllable generation, supporting both regression and classification tasks. Contribution/Results: VECTOR+ introduces three key innovations: (1) property-augmented representation learning, (2) molecular resampling to refine structural diversity, and (3) interpretability-aware constraints to enhance controllability. These improvements significantly boost the novelty, synthetic accessibility, and target binding affinity of generated molecules. Validated via molecular docking and all-atom molecular dynamics simulations, VECTOR+-generated compounds exhibit superior binding affinities against PD-L1 and kinase targets compared to state-of-the-art methods and several approved drugs. The framework demonstrates strong generalization across diverse target classes and promising translational potential for early-stage drug discovery.
📝 Abstract
Efficiently steering generative models toward pharmacologically relevant regions of chemical space remains a major obstacle in molecular drug discovery under low-data regimes. We present VECTOR+: Valid-property-Enhanced Contrastive Learning for Targeted Optimization and Resampling, a framework that couples property-guided representation learning with controllable molecule generation. VECTOR+ applies to both regression and classification tasks and enables interpretable, data-efficient exploration of functional chemical space. We evaluate on two datasets: a curated PD-L1 inhibitor set (296 compounds with experimental $IC_{50}$ values) and a receptor kinase inhibitor set (2,056 molecules by binding mode). Despite limited training data, VECTOR+ generates novel, synthetically tractable candidates. Against PD-L1 (PDB 5J89), 100 of 8,374 generated molecules surpass a docking threshold of $-15.0$ kcal/mol, with the best scoring $-17.6$ kcal/mol compared to the top reference inhibitor ($-15.4$ kcal/mol). The best-performing molecules retain the conserved biphenyl pharmacophore while introducing novel motifs. Molecular dynamics (250 ns) confirm binding stability (ligand RMSD < $2.5$ angstroms). VECTOR+ generalizes to kinase inhibitors, producing compounds with stronger docking scores than established drugs such as brigatinib and sorafenib. Benchmarking against JT-VAE and MolGPT across docking, novelty, uniqueness, and Tanimoto similarity highlights the superior performance of our method. These results position our work as a robust, extensible approach for property-conditioned molecular design in low-data settings, bridging contrastive learning and generative modeling for reproducible, AI-accelerated discovery.