What drives performance in molecular MPNNs? An operator-level factorial benchmark

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of disentangling the individual contributions of operators in molecular message-passing neural networks (MPNNs). For the first time, MPNNs are systematically decomposed at the operator level into three core components: message seed initialization, node-edge fusion, and node update. The authors construct 84 distinct configurations and conduct a factorized benchmark across ten MoleculeNet datasets under unified experimental settings. Through statistical analysis, representation probing, and mechanistic interpretation, they demonstrate that message construction—particularly the choice of initialization and fusion strategy—dominates model performance. Message seed initialization significantly impacts both classification and regression tasks, while concatenation-based fusion excels in regression and exhibits robustness against over-smoothing. The optimal configuration achieves state-of-the-art results on eight datasets, offering clear design principles for incorporating chemical information into MPNN architectures.
📝 Abstract
Message-passing neural networks (MPNNs) are widely used for molecular property prediction, but their deployment as monolithic architectures makes it difficult to identify how specific message-passing operators affect performance. We present an operator-level factorial benchmark that decomposes 2D molecular MPNNs into the three families of message-seed initialization, node-edge fusion, and node update operators. The resulting 84 configurations are benchmarked on ten MoleculeNet datasets under a shared experimental setup and statistical analysis protocol. Across this controlled design, performance variation is associated primarily with message construction rather than update complexity. Message-seed initialization shows significant family-level effects for both regression and classification, node-edge fusion shows a significant family-level effect for regression with descriptive advantages for concatenation-based mixing, and the update family shows no statistically supported effect for either endpoint family. A representation probe into the Quinethazone molecule further demonstrates that concatenation-based mixing can better differentiate chemically distinct heteroatoms and withstand oversmoothing than Hadamard gating. Representative configurations selected separately for classification and regression recover competitive performance relative to established molecular graph neural network (GNN) baselines, ranking numerically best on eight of ten benchmark datasets. These empirical results are interpreted through concise mechanistic analyses of representative node-edge fusion and update operators. Our findings provide empirical design heuristics for molecular MPNNs by turning model design from a search over monolithic architectures into a targeted assessment of where and how chemical information enters the message-passing pipeline.
Problem

Research questions and friction points this paper is trying to address.

message-passing neural networks
molecular property prediction
operator-level analysis
graph neural networks
model interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

operator-level benchmark
message-passing neural networks
node-edge fusion
molecular representation
factorial design
🔎 Similar Papers
No similar papers found.
P
Panyu Jiao
Materials Genome Institute, Shanghai University, Shanghai 200444, China
S
Shuizhou Chen
School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
Yiheng Shen
Yiheng Shen
Computer Science, Duke University
Economics and Computation
Y
Yuyang Wang
Materials Genome Institute, Shanghai University, Shanghai 200444, China
Runhai Ouyang
Runhai Ouyang
Shanghai University
Machine LearningComputational CatalysisMaterials Informatics
Wei Xie
Wei Xie
Central China Normal University
computer vision