🤖 AI Summary
This work addresses the limitation of the existing Fused Gromov-Wasserstein (FGW) distance, which treats all features equally in high-dimensional settings and is thus susceptible to irrelevant or noisy features, compromising alignment interpretability and robustness. To overcome this, we propose an adaptive feature selection FGW model that automatically weights informative features and suppresses redundant ones by incorporating feature-specific weights into the objective function, regularized via Lasso or Ridge penalties and constrained to lie on a simplex. We establish theoretical bounds linking the proposed model to classical FGW and Gromov-Wasserstein distances and prove it satisfies metric properties. An efficient alternating minimization algorithm is developed for optimization, with support for group-sparsity extensions. Experiments demonstrate that our method significantly enhances alignment interpretability, accurately identifies task-relevant structures, and achieves superior performance in applications such as computational redistricting.
📝 Abstract
Fused Gromov-Wasserstein (FGW) distances provide a principled framework for comparing objects by jointly aligning structure and node features. However, existing FGW formulations treat all features uniformly, which limits interpretability and robustness in high-dimensional settings where many features may be irrelevant or noisy. We introduce FGW distances with feature selection, which incorporate adaptive feature suppression weights into the FGW objective to selectively downweight or suppress differentiating features during alignment. We propose two approaches: (1) regularized FGW with Lasso and Ridge penalties, and (2) FGW with simplex-constrained weights, including groupwise extensions. We analyze the resulting models and establish their key theoretical properties, including bounds relative to classical FGW and Gromov-Wasserstein distances, and metric behavior. An efficient alternating minimization algorithm is developed. Experiments illustrate how feature suppression enhances interpretability and reveals task-relevant structure, with a special application to computational redistricting.