🤖 AI Summary
This work addresses the construction of random feature approximations for bivariate kernel functions on general manifolds. By leveraging manifold discretization and Graph Random Features (GRFs), the authors propose a framework to learn continuous fields over manifolds, introducing manifold-aware random features that preserve positivity and boundedness. They establish an asymptotic connection between GRFs and continuous kernel random features, thereby simplifying the mathematical derivation of Gaussian kernel approximations. Built upon rigorous theoretical analysis and comprehensive empirical validation, the method successfully reproduces and enhances the Gaussian kernel approximation used in linear-attention Transformers, achieving low-variance, high-accuracy function approximation.
📝 Abstract
We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Features (MRFs) leverages discretization of the manifold and the recently introduced technique of Graph Random Features (GRFs) to learn continuous fields on manifolds. Those fields are used to find continuous approximation mechanisms that otherwise, in general scenarios, cannot be derived analytically. MRFs provide positive and bounded features, a key property for accurate, low-variance approximation. We show deep asymptotic connection between GRFs, defined on discrete graph objects, and continuous random features used for regular kernels. As a by-product of our method, we re-discover recently introduced mechanism of Gaussian kernel approximation applied in particular to improve linear-attention Transformers, considering simple random walks on graphs and by-passing original complex mathematical computations. We complement our algorithm with a rigorous theoretical analysis and verify in thorough experimental studies.