š¤ AI Summary
Molecular property annotation scarcity severely limits AI applications in drug and materials discovery, making few-shot molecular property prediction (FSMPP) a critical challenge. This paper presents the first systematic survey of FSMPP, identifying two fundamental bottlenecks: cross-property generalizationāhindered by distributional shifts and weak biochemical correlations among propertiesāand cross-molecule generalizationāundermined by high structural heterogeneity. To address these, we propose a multi-level taxonomy encompassing data, models, and learning paradigms; integrate graph neural networks, meta-learning, knowledge transfer, and chemical priors to enhance prediction robustness under extreme label scarcity; and unify mainstream benchmarks, evaluation protocols, and method performance. Our work establishes the first scalable research framework for FSMPP and delivers a clear, actionable technical roadmapābridging foundational gaps and accelerating progress in low-data molecular AI.
š Abstract
AI-assisted molecular property prediction has become a promising technique in early-stage drug discovery and materials design in recent years. However, due to high-cost and complex wet-lab experiments, real-world molecules usually experience the issue of scarce annotations, leading to limited labeled data for effective supervised AI model learning. In light of this, few-shot molecular property prediction (FSMPP) has emerged as an expressive paradigm that enables learning from only a few labeled examples. Despite rapidly growing attention, existing FSMPP studies remain fragmented, without a coherent framework to capture methodological advances and domain-specific challenges. In this work, we present the first comprehensive and systematic survey of few-shot molecular property prediction. We begin by analyzing the few-shot phenomenon in molecular datasets and highlighting two core challenges: (1) cross-property generalization under distribution shifts, where each task corresponding to each property, may follow a different data distribution or even be inherently weakly related to others from a biochemical perspective, requiring the model to transfer knowledge across heterogeneous prediction tasks, and (2) cross-molecule generalization under structural heterogeneity, where molecules involved in different or same properties may exhibit significant structural diversity, making model difficult to achieve generalization. Then, we introduce a unified taxonomy that organizes existing methods into data, model, and learning paradigm levels, reflecting their strategies for extracting knowledge from scarce supervision in few-shot molecular property prediction. Next, we compare representative methods, summarize benchmark datasets and evaluation protocols. In the end, we identify key trends and future directions for advancing the continued research on FSMPP.