SPECTRA: Spectral Target-Aware Graph Augmentation for Imbalanced Molecular Property Regression

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In molecular property prediction, high-value compounds (e.g., highly active molecules) exhibit sparse distribution, degrading standard GNNs’ predictive accuracy in critical regions; existing oversampling methods often distort molecular topology. To address this, we propose the first spectral-domain, target-aware graph augmentation framework: it establishes node correspondences via Gromov–Wasserstein alignment and jointly interpolates spectral and nodal features in a shared Laplacian eigenbasis—preserving structural validity and physical interpretability. The framework integrates scarcity-aware kernel density sampling, edge reconstruction, and edge-aware Chebyshev convolutional GNNs. Evaluated on multiple benchmarks, our method significantly reduces prediction error in high-value regions (average reduction of 18.7%) while maintaining competitive overall MAE. Generated molecules are chemically valid and spectrally geometrically interpretable.

Technology Category

Application Category

📝 Abstract
In molecular property prediction, the most valuable compounds (e.g., high potency) often occupy sparse regions of the target space. Standard Graph Neural Networks (GNNs) commonly optimize for the average error, underperforming on these uncommon but critical cases, with existing oversampling methods often distorting molecular topology. In this paper, we introduce SPECTRA, a Spectral Target-Aware graph augmentation framework that generates realistic molecular graphs in the spectral domain. SPECTRA (i) reconstructs multi-attribute molecular graphs from SMILES; (ii) aligns molecule pairs via (Fused) Gromov-Wasserstein couplings to obtain node correspondences; (iii) interpolates Laplacian eigenvalues, eigenvectors and node features in a stable share-basis; and (iv) reconstructs edges to synthesize physically plausible intermediates with interpolated targets. A rarity-aware budgeting scheme, derived from a kernel density estimation of labels, concentrates augmentation where data are scarce. Coupled with a spectral GNN using edge-aware Chebyshev convolutions, SPECTRA densifies underrepresented regions without degrading global accuracy. On benchmarks, SPECTRA consistently improves error in relevant target ranges while maintaining competitive overall MAE, and yields interpretable synthetic molecules whose structure reflects the underlying spectral geometry. Our results demonstrate that spectral, geometry-aware augmentation is an effective and efficient strategy for imbalanced molecular property regression.
Problem

Research questions and friction points this paper is trying to address.

Addresses imbalanced molecular property prediction for rare high-value compounds
Generates realistic molecular graphs via spectral domain augmentation
Improves performance on sparse target regions while maintaining global accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates molecular graphs in spectral domain
Aligns molecules via Fused Gromov-Wasserstein couplings
Interpolates Laplacian eigenvalues for synthetic intermediates
🔎 Similar Papers
No similar papers found.
B
Brenda Nogueira
Department of Computer Science, University of Notre Dame, Notre Dame, IN, USA
M
Meng Jiang
Department of Computer Science, University of Notre Dame, Notre Dame, IN, USA
N
Nitesh V. Chawla
Department of Computer Science, University of Notre Dame, Notre Dame, IN, USA
Nuno Moniz
Nuno Moniz
Associate Research Professor at Lucy Family Institute for Data & Society, University of Notre Dame
Imbalanced LearningResponsible AIData Privacy