GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work proposes an interpretable geometric framework based on Generalized Singular Value Decomposition (GSVD) for sample-level comparison of two datasets while preserving their intrinsic structures. By constructing a joint subspace coordinate system and imposing the common-space constraint $Ax = By = z$, the method disentangles shared and dataset-specific directions and introduces a sample alignment angle $\theta(z)$ to quantify the relative explanatory contribution of each sample across the two datasets. As the first framework to leverage subspace alignment angles for sample-wise comparison, the proposed alignment angle serves as an interpretable diagnostic tool. Experiments on MNIST illustrate canonical GSVD directions and the distribution of alignment angles, demonstrating the effectiveness of a binary classifier built upon $\theta(z)$.

Technology Category

Application Category

📝 Abstract

Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between two data matrices, expressed via the co-span constraint $Ax = By = z$ in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form $A = HCU$, $B = HSV$ with $C^{\top}C + S^{\top}S = I$, which separates shared versus dataset-specific directions through the diagonal structure of $(C, S)$. From these factors we derive an interpretable *angle score* $\theta(z) \in [0, \pi/2]$ for a sample $z$, quantifying whether z is explained relatively more by $A$, more by $B$, or comparably by both. The primary role of $\theta(z)$ is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from $\theta(z)$ is presented as an illustrative application of the score as an interpretable diagnostic tool.

Problem

Research questions and friction points this paper is trying to address.

geometry-grounded learning

dataset comparison

generalized singular value decomposition

subspace alignment

geometric diagnostic

Innovation

Methods, ideas, or system contributions that make the work stand out.

GSVD

geometry-grounded learning

dataset comparison