🤖 AI Summary
This work proposes an interpretable geometric framework based on Generalized Singular Value Decomposition (GSVD) for sample-level comparison of two datasets while preserving their intrinsic structures. By constructing a joint subspace coordinate system and imposing the common-space constraint \(Ax = By = z\), the method disentangles shared and dataset-specific directions and introduces a sample alignment angle \(\theta(z)\) to quantify the relative explanatory contribution of each sample across the two datasets. As the first framework to leverage subspace alignment angles for sample-wise comparison, the proposed alignment angle serves as an interpretable diagnostic tool. Experiments on MNIST illustrate canonical GSVD directions and the distribution of alignment angles, demonstrating the effectiveness of a binary classifier built upon \(\theta(z)\).
📝 Abstract
Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between two data matrices, expressed via the co-span constraint $Ax = By = z$ in a shared ambient space. To operationalize this comparison, we use the generalized singular value decomposition (GSVD) as a joint coordinate system for two subspaces. In particular, we exploit the GSVD form $A = HCU$, $B = HSV$ with $C^{\top}C + S^{\top}S = I$, which separates shared versus dataset-specific directions through the diagonal structure of $(C, S)$. From these factors we derive an interpretable *angle score* $\theta(z) \in [0, \pi/2]$ for a sample $z$, quantifying whether z is explained relatively more by $A$, more by $B$, or comparably by both. The primary role of $\theta(z)$ is as a *per-sample geometric diagnostic*. We illustrate the behavior of the score on MNIST through angle distributions and representative GSVD directions. A binary classifier derived from $\theta(z)$ is presented as an illustrative application of the score as an interpretable diagnostic tool.