🤖 AI Summary
CountSketch lacks high-performance GPU implementations and remains underutilized in multisketching and least-squares solving. Method: This paper proposes a GPU-optimized CountSketch algorithm integrating multisketching and parallel memory access optimizations to accelerate randomized dimensionality reduction. Furthermore, it develops a numerically stable least-squares solver built upon CountSketch, supporting high-fidelity sketch aggregation and residual correction. Contributions/Results: Experiments demonstrate that the proposed method achieves a 77% speedup over the normal equations on standard least-squares problems, while significantly improving numerical stability: the relative residual error is rigorously bounded at 𝒪(1). The framework establishes a new, efficient, and robust GPU-accelerated paradigm for large-scale linear algebra computations.
📝 Abstract
Random sketching is a dimensionality reduction technique that approximately preserves norms and singular values up to some $O(1)$ distortion factor with high probability. The most popular sketches in literature are the Gaussian sketch and the subsampled randomized Hadamard transform, while the CountSketch has lower complexity. Combining two sketches, known as multisketching, offers an inexpensive means of quickly reducing the dimension of a matrix by combining a CountSketch and Gaussian sketch.
However, there has been little investigation into high performance CountSketch implementations. In this work, we develop an efficient GPU implementation of the CountSketch, and demonstrate the performance of multisketching using this technique. We also demonstrate the potential for using this implementation within a multisketched least squares solver that is up to $77%$ faster than the normal equations with significantly better numerical stability, at the cost of an $O(1)$ multiplicative factor introduced into the relative residual norm.