🤖 AI Summary
Atomic-level representations in 3D molecular property prediction often overlook fine-grained physical details critical for accurate modeling. Method: This work proposes electron density maps—direct outputs from X-ray crystallography or cryo-EM—as continuous, physics-driven molecular representations. We employ a voxelized CNN to jointly encode raw electron density, its gradient magnitude, and spatial occupancy information. Evaluation is conducted systematically on PDBbind (protein–ligand binding affinity) and QM9 (quantum properties). Results: Electron density representations significantly improve data efficiency in low-data regimes, outperforming atomic representations in few-shot settings; they also achieve superior accuracy in large-scale quantum property prediction. Contribution: This is the first systematic study demonstrating that electron density serves as a general-purpose, learnable, physics-grounded representation, exhibiting task-adaptive advantages across diverse 3D molecular learning tasks—establishing a new paradigm for data-efficient and physically interpretable modeling.
📝 Abstract
Machine learning models for 3D molecular property prediction typically rely on atom-based representations, which may overlook subtle physical information. Electron density maps -- the direct output of X-ray crystallography and cryo-electron microscopy -- offer a continuous, physically grounded alternative. We compare three voxel-based input types for 3D convolutional neural networks (CNNs): atom types, raw electron density, and density gradient magnitude, across two molecular tasks -- protein-ligand binding affinity prediction (PDBbind) and quantum property prediction (QM9). We focus on voxel-based CNNs because electron density is inherently volumetric, and voxel grids provide the most natural representation for both experimental and computed densities. On PDBbind, all representations perform similarly with full data, but in low-data regimes, density-based inputs outperform atom types, while a shape-based baseline performs comparably -- suggesting that spatial occupancy dominates this task. On QM9, where labels are derived from Density Functional Theory (DFT) but input densities from a lower-level method (XTB), density-based inputs still outperform atom-based ones at scale, reflecting the rich structural and electronic information encoded in density. Overall, these results highlight the task- and regime-dependent strengths of density-derived inputs, improving data efficiency in affinity prediction and accuracy in quantum property modeling.