Spectral Manifold Harmonization for Graph Imbalanced Regression

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific graph data often exhibit imbalanced regression: models overfit to the mean region, while high-value targets—such as specific activity ranges—are sparsely represented and poorly predicted. This work introduces spectral manifold alignment to graph-based imbalanced regression for the first time, proposing a topology-preserving synthetic sample generation method. It aligns the source and target manifolds in the spectral domain and synthesizes graph samples that explicitly cover critical target intervals, jointly regulating both graph topology and node/edge attribute distributions. The approach mitigates model bias toward the mean, significantly improving prediction accuracy for scarce target intervals across multiple benchmark datasets in chemistry and drug discovery—achieving average MAE reductions of 12.7%–23.4%. Empirical results demonstrate its effectiveness, interpretability, and cross-domain generalizability.

Technology Category

Application Category

📝 Abstract
Graph-structured data is ubiquitous in scientific domains, where models often face imbalanced learning settings. In imbalanced regression, domain preferences focus on specific target value ranges representing the most scientifically valuable cases; we observe a significant lack of research. In this paper, we present Spectral Manifold Harmonization (SMH), a novel approach for addressing this imbalanced regression challenge on graph-structured data by generating synthetic graph samples that preserve topological properties while focusing on often underrepresented target distribution regions. Conventional methods fail in this context because they either ignore graph topology in case generation or do not target specific domain ranges, resulting in models biased toward average target values. Experimental results demonstrate the potential of SMH on chemistry and drug discovery benchmark datasets, showing consistent improvements in predictive performance for target domain ranges.
Problem

Research questions and friction points this paper is trying to address.

Address imbalanced regression in graph-structured data
Generate synthetic graphs preserving topological properties
Improve predictive performance in underrepresented target ranges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic graph samples preserving topology
Focuses on underrepresented target distribution regions
Improves predictive performance in target domains
🔎 Similar Papers
No similar papers found.
B
Brenda Nogueira
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
G
Gabe Gomes
Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
M
Meng Jiang
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
N
Nitesh V. Chawla
Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, USA
Nuno Moniz
Nuno Moniz
Associate Research Professor at Lucy Family Institute for Data & Society, University of Notre Dame
Imbalanced LearningResponsible AIData Privacy