Rethinking Molecular OOD Generalization via Target-Aware Source Selection

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

235K/year
🤖 AI Summary
This study addresses the generalization bottleneck in molecular property prediction under extreme out-of-distribution (OOD) scenarios in AI-driven drug discovery, where existing methods suffer from performance overestimation and negative transfer due to microscopic semantic overlap and indiscriminate domain alignment. To tackle this, the authors introduce SCOPE-BENCH, a novel evaluation benchmark featuring cluster-level OOD splits based on physicochemical descriptor space, and POMA, a framework that enables target-oriented multi-source knowledge transfer through structure-aware source selection—formulated as a retrieval-composition-adaptation pipeline—and dual-scale (topological and pharmacophoric) domain adaptation. Experiments show that state-of-the-art 3D models exhibit a 5.9× average increase in error on SCOPE-BENCH, whereas POMA reduces mean absolute error by 6.2% on average, with improvements reaching up to 11.2%.
📝 Abstract
Robust prediction of molecular properties under extreme out-of-distribution (OOD) scenarios is a pivotal bottleneck in AI-driven drug discovery. Current scaffold-splitting protocols fail to obstruct microscopic semantic overlap, predisposing models to shortcut learning and overestimating their true extrapolation capability; meanwhile, conventional domain adaptation paradigms suffer under extreme structural shifts, as blindly aligning heterogeneous source libraries injects topological noise and triggers negative transfer. To address these two challenges, scaffold-cluster out-of-distribution performance evaluation benchmark (SCOPE-BENCH), a benchmark built on cluster-level partitioning in an explicit physicochemical descriptor space, is proposed alongside policy optimization for multi-source adaptation (POMA), a framework that formulates knowledge transfer as a retrieve-compose-adapt pipeline: labeled source scaffolds structurally close to the unlabeled target are first identified as proxy targets; a reinforcement learning policy then adaptively selects the optimal source subset from an exponentially large candidate pool; and dual-scale domain adaptation is finally performed at macroscopic topological and microscopic pharmacophore scales. Evaluations show that prediction errors of state-of-the-art 3D molecular models surge by up to 8.0x on SCOPE-BENCH with a mean of 5.9x, while POMA achieves up to an 11.2% reduction in mean absolute error with an average relative improvement of 6.2% across diverse backbone architectures. Code is available at https://anonymous.4open.science/r/Molecular-OOD-Code-73F6.
Problem

Research questions and friction points this paper is trying to address.

out-of-distribution generalization
molecular property prediction
scaffold splitting
domain adaptation
drug discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

out-of-distribution generalization
molecular property prediction
domain adaptation
reinforcement learning
scaffold clustering
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid