Spectral Manifold Harmonization for Graph Imbalanced Regression

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Scientific graph data often exhibit imbalanced regression: models overfit to the mean region, while high-value targets—such as specific activity ranges—are sparsely represented and poorly predicted. This work introduces spectral manifold alignment to graph-based imbalanced regression for the first time, proposing a topology-preserving synthetic sample generation method. It aligns the source and target manifolds in the spectral domain and synthesizes graph samples that explicitly cover critical target intervals, jointly regulating both graph topology and node/edge attribute distributions. The approach mitigates model bias toward the mean, significantly improving prediction accuracy for scarce target intervals across multiple benchmark datasets in chemistry and drug discovery—achieving average MAE reductions of 12.7%–23.4%. Empirical results demonstrate its effectiveness, interpretability, and cross-domain generalizability.

Technology Category

Application Category

📝 Abstract

Graph-structured data is ubiquitous in scientific domains, where models often face imbalanced learning settings. In imbalanced regression, domain preferences focus on specific target value ranges representing the most scientifically valuable cases; we observe a significant lack of research. In this paper, we present Spectral Manifold Harmonization (SMH), a novel approach for addressing this imbalanced regression challenge on graph-structured data by generating synthetic graph samples that preserve topological properties while focusing on often underrepresented target distribution regions. Conventional methods fail in this context because they either ignore graph topology in case generation or do not target specific domain ranges, resulting in models biased toward average target values. Experimental results demonstrate the potential of SMH on chemistry and drug discovery benchmark datasets, showing consistent improvements in predictive performance for target domain ranges.

Problem

Research questions and friction points this paper is trying to address.

Address imbalanced regression in graph-structured data

Generate synthetic graphs preserving topological properties

Improve predictive performance in underrepresented target ranges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic graph samples preserving topology

Focuses on underrepresented target distribution regions

Improves predictive performance in target domains

🔎 Similar Papers

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition