🤖 AI Summary
In molecular property prediction, high-value compounds (e.g., highly active molecules) exhibit sparse distribution, degrading standard GNNs’ predictive accuracy in critical regions; existing oversampling methods often distort molecular topology. To address this, we propose the first spectral-domain, target-aware graph augmentation framework: it establishes node correspondences via Gromov–Wasserstein alignment and jointly interpolates spectral and nodal features in a shared Laplacian eigenbasis—preserving structural validity and physical interpretability. The framework integrates scarcity-aware kernel density sampling, edge reconstruction, and edge-aware Chebyshev convolutional GNNs. Evaluated on multiple benchmarks, our method significantly reduces prediction error in high-value regions (average reduction of 18.7%) while maintaining competitive overall MAE. Generated molecules are chemically valid and spectrally geometrically interpretable.
📝 Abstract
In molecular property prediction, the most valuable compounds (e.g., high potency) often occupy sparse regions of the target space. Standard Graph Neural Networks (GNNs) commonly optimize for the average error, underperforming on these uncommon but critical cases, with existing oversampling methods often distorting molecular topology. In this paper, we introduce SPECTRA, a Spectral Target-Aware graph augmentation framework that generates realistic molecular graphs in the spectral domain. SPECTRA (i) reconstructs multi-attribute molecular graphs from SMILES; (ii) aligns molecule pairs via (Fused) Gromov-Wasserstein couplings to obtain node correspondences; (iii) interpolates Laplacian eigenvalues, eigenvectors and node features in a stable share-basis; and (iv) reconstructs edges to synthesize physically plausible intermediates with interpolated targets. A rarity-aware budgeting scheme, derived from a kernel density estimation of labels, concentrates augmentation where data are scarce. Coupled with a spectral GNN using edge-aware Chebyshev convolutions, SPECTRA densifies underrepresented regions without degrading global accuracy. On benchmarks, SPECTRA consistently improves error in relevant target ranges while maintaining competitive overall MAE, and yields interpretable synthetic molecules whose structure reflects the underlying spectral geometry. Our results demonstrate that spectral, geometry-aware augmentation is an effective and efficient strategy for imbalanced molecular property regression.