A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the long-standing absence of a unified taxonomy and reliable benchmarking framework in molecular property prediction, particularly in light of emerging challenges posed by foundation models. The study proposes a cohesive classification scheme encompassing molecular representations, model architectures, and interdisciplinary applications, and systematically evaluates four major paradigms: quantum chemistry, descriptor-based machine learning, geometric deep learning, and foundation models through multidimensional benchmarking. The analysis exposes critical shortcomings in current benchmarks regarding stereochemical consistency, experimental heterogeneity, and reproducibility. To advance the field, the paper advocates three key directions: physics-informed learning with quantum consistency, uncertainty-calibrated foundation models for trustworthy inference, and multimodal real-world benchmark ecosystems integrating computational and experimental data—collectively paving the way toward transparent, temporally aware, and scaffold-sensitive next-generation benchmarks.

Technology Category

Application Category

📝 Abstract

Molecular property prediction integrates quantum chemistry, cheminformatics, and deep learning to connect molecular structure with physicochemical and biological behavior. This survey traces four complementary paradigms, including Quantum, Descriptor Machine Learning, Geometric Deep Learning, and Foundation Models, and outlines a unified taxonomy linking molecular representations, model architectures, and interdisciplinary applications. Benchmark analyses integrate evidence from both widely used datasets and datasets reflecting industry perspectives, encompassing quantum, physicochemical, physiological, and biophysical domains. The survey examines current standards in data curation, splitting strategies, and evaluation protocols, highlighting challenges including inconsistent stereochemistry, heterogeneous assay sources, and reproducibility limitations under random or poorly defined splits. These observations motivate the modernization of benchmark design toward more transparent, time- and scaffold-aware methodologies. We further propose three forward-looking directions: (i) physics-aware learning embedding quantum consistency, (ii) uncertainty-calibrated foundation models for trustworthy inference, and (iii) realistic multimodal benchmark ecosystems integrating computational and experimental data. Repository: https://github.com/Zongru-Li/Survey-and-Benchmarks-of-DL-for-Molecular-Property-Prediction-in-the-Foundation-Model-Era.

Problem

Research questions and friction points this paper is trying to address.

molecular property prediction

foundation models

benchmarking

data curation

reproducibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Foundation Models

Molecular Property Prediction

Geometric Deep Learning