🤖 AI Summary
This work addresses large-scale linear regression by proposing a GPU-accelerated Sketch-and-Precondition framework driven by sparse sign sketches. To overcome bottlenecks in sketch generation and application within heterogeneous parallel environments, we design a lightweight sparse sign sketching algorithm based on rejection sampling, significantly improving GPU parallel efficiency. We further conduct the first systematic evaluation of this paradigm’s scalability and practicality on single- and multi-GPU platforms. The method integrates sparse random projections, sign-based sketching, GPU acceleration, and preconditioned conjugate gradient solvers—achieving numerical robustness while substantially reducing communication overhead. Experimental results demonstrate superior performance over highly optimized CPU-based approaches and confirm its engineering viability for integration into black-box least-squares solvers.
📝 Abstract
A litany of theoretical and numerical results have established the sketch-and-precondition paradigm as a powerful approach to solving large linear regression problems in standard computing environments. Perhaps surprisingly, much less work has been done on understanding how sketch-and-precondition performs on graphics processing unit (GPU) systems. We address this gap by benchmarking an implementation of sketch-and-precondition based on sparse sign-sketches on single and multi-GPU systems. In doing so, we describe a novel, easily parallelized, rejection-sampling based method for generating sparse sign sketches. Our approach, which is particularly well-suited for GPUs, is easily adapted to a variety of computing environments. Taken as a whole, our numerical experiments indicate that sketch-and-precondition with sparse sign sketches is particularly well-suited for GPUs, and may be suitable for use in black-box least-squares solvers.