A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the long-standing absence of a unified taxonomy and reliable benchmarking framework in molecular property prediction, particularly in light of emerging challenges posed by foundation models. The study proposes a cohesive classification scheme encompassing molecular representations, model architectures, and interdisciplinary applications, and systematically evaluates four major paradigms: quantum chemistry, descriptor-based machine learning, geometric deep learning, and foundation models through multidimensional benchmarking. The analysis exposes critical shortcomings in current benchmarks regarding stereochemical consistency, experimental heterogeneity, and reproducibility. To advance the field, the paper advocates three key directions: physics-informed learning with quantum consistency, uncertainty-calibrated foundation models for trustworthy inference, and multimodal real-world benchmark ecosystems integrating computational and experimental data—collectively paving the way toward transparent, temporally aware, and scaffold-sensitive next-generation benchmarks.

Technology Category

Application Category

📝 Abstract
Molecular property prediction integrates quantum chemistry, cheminformatics, and deep learning to connect molecular structure with physicochemical and biological behavior. This survey traces four complementary paradigms, including Quantum, Descriptor Machine Learning, Geometric Deep Learning, and Foundation Models, and outlines a unified taxonomy linking molecular representations, model architectures, and interdisciplinary applications. Benchmark analyses integrate evidence from both widely used datasets and datasets reflecting industry perspectives, encompassing quantum, physicochemical, physiological, and biophysical domains. The survey examines current standards in data curation, splitting strategies, and evaluation protocols, highlighting challenges including inconsistent stereochemistry, heterogeneous assay sources, and reproducibility limitations under random or poorly defined splits. These observations motivate the modernization of benchmark design toward more transparent, time- and scaffold-aware methodologies. We further propose three forward-looking directions: (i) physics-aware learning embedding quantum consistency, (ii) uncertainty-calibrated foundation models for trustworthy inference, and (iii) realistic multimodal benchmark ecosystems integrating computational and experimental data. Repository: https://github.com/Zongru-Li/Survey-and-Benchmarks-of-DL-for-Molecular-Property-Prediction-in-the-Foundation-Model-Era.
Problem

Research questions and friction points this paper is trying to address.

molecular property prediction
foundation models
benchmarking
data curation
reproducibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation Models
Molecular Property Prediction
Geometric Deep Learning
Benchmarking
Uncertainty Calibration
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Z
Zongru Li
The University of Hong Kong, Hong Kong SAR
X
Xingsheng Chen
The University of Hong Kong, Hong Kong SAR
H
Honggang Wen
The University of Hong Kong, Hong Kong SAR
R
Regina Qianru Zhang
Nanyang Technological University, Singapore
Ming Li
Ming Li
Professor, Zhejiang Normal University
Graph Neural NetworksGraph LearningHypergraph LearningAI for Education
X
Xiaojin Zhang
The Hong Kong University of Science and Technology, Hong Kong SAR
Hongzhi Yin
Hongzhi Yin
Professor and ARC Future Fellow, University of Queensland
Recommender SystemGraph LearningSpatial-temporal PredictionEdge IntelligenceLLM
Q
Qiang Yang
The Hong Kong Polytechnic University, Hong Kong SAR
Kwok-Yan Lam
Kwok-Yan Lam
Nanyang Technological University
CybersecurityPrivacy-Preserving technologiesDigital TrustDistributing systemsLegalTech
P
Pietro Lio
University of Cambridge, United Kingdom
Siu-Ming Yiu
Siu-Ming Yiu
Professor of Computer Science, The University of Hong Kong
CybersecurityCryptographyFinTechBioinformatics