SubDyve: Subgraph-Driven Dynamic Propagation for Virtual Screening Enhancement Controlling False Positive

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
In low-label virtual screening, key challenges include scarcity of active compounds, inability of conventional molecular fingerprints to capture biologically relevant substructures, and poor generalizability due to molecule-wise independent modeling. To address these, we propose the Subgraph-aware Dynamic Propagation (SGDP) framework. SGDP constructs a subgraph-driven similarity network and employs iterative seed optimization—integrating dynamic signal propagation with local false discovery rate control—to expand the active set while mitigating topological bias and false positives. Crucially, it unifies cheminformatics-based substructure identification with network-topological reasoning for end-to-end activity prediction. Evaluated on ten multi-target zero-shot tasks and a million-scale CDK7 dataset, SGDP significantly outperforms state-of-the-art fingerprint- and embedding-based methods, achieving +34.0 in BEDROC and +24.6 in EF1%.

Technology Category

Application Category

📝 Abstract
Virtual screening (VS) aims to identify bioactive compounds from vast chemical libraries, but remains difficult in low-label regimes where only a few actives are known. Existing methods largely rely on general-purpose molecular fingerprints and overlook class-discriminative substructures critical to bioactivity. Moreover, they consider molecules independently, limiting effectiveness in low-label regimes. We introduce SubDyve, a network-based VS framework that constructs a subgraph-aware similarity network and propagates activity signals from a small known actives. When few active compounds are available, SubDyve performs iterative seed refinement, incrementally promoting new candidates based on local false discovery rate. This strategy expands the seed set with promising candidates while controlling false positives from topological bias and overexpansion. We evaluate SubDyve on ten DUD-E targets under zero-shot conditions and on the CDK7 target with a 10-million-compound ZINC dataset. SubDyve consistently outperforms existing fingerprint or embedding-based approaches, achieving margins of up to +34.0 on the BEDROC and +24.6 on the EF1% metric.
Problem

Research questions and friction points this paper is trying to address.

Enhancing virtual screening with limited known bioactive compounds
Overcoming limitations of molecular fingerprints ignoring discriminative substructures
Controlling false positives while expanding active compound candidates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Subgraph-aware similarity network construction
Iterative seed refinement with false discovery control
Dynamic activity propagation from few known actives
🔎 Similar Papers
No similar papers found.