🤖 AI Summary
This paper addresses the out-of-sample embedding problem: efficiently and robustly embedding new samples into an existing vector space given similarity or dissimilarity data. We propose a unified theoretical framework that, for the first time, systematically categorizes existing kernel-based methods into two paradigms—kernel-based projection and constrained reconstruction—and rigorously establish their mathematical equivalence and applicability boundaries. Our analysis reveals that constrained reconstruction reduces to a unidimensional search and elucidates its statistical robustness mechanism. Leveraging these insights, we design a computationally efficient, noise-resilient constrained reconstruction algorithm. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches under high noise and sparse neighborhood conditions. The proposed framework provides a principled foundation for incremental updates and practical deployment of graph embeddings.
📝 Abstract
The problem of using proximity (similarity or dissimilarity) data for the purpose of"adding a point to a vector diagram"was first studied by J.C. Gower in 1968. Since then, a number of methods -- mostly kernel methods -- have been proposed for solving what has come to be called the problem of *out-of-sample embedding*. We survey the various kernel methods that we have encountered and show that each can be derived from one or the other of two competing strategies: *projection* or *restricted reconstruction*. Projection can be analogized to a well-known formula for adding a point to a principal component analysis. Restricted reconstruction poses a different challenge: how to best approximate redoing the entire multivariate analysis while holding fixed the vector diagram that was previously obtained. This strategy results in a nonlinear optimization problem that can be simplified to a unidimensional search. Various circumstances may warrant either projection or restricted reconstruction.