RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs

📅 2024-09-01

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the inefficiency of row-wise Top-K selection on GPUs—a bottleneck in information retrieval, large-scale data processing, and graph neural network (GNN) training. We propose RTop-K, an ultra-fast parallel algorithm built upon a novel GPU-optimized binary-search-based framework for row-wise Top-K selection. RTop-K integrates a dynamic early-stopping mechanism with memory-hierarchy-aware scheduling, achieving substantial latency reduction without compromising numerical precision. Theoretical analysis and empirical evaluation validate the effectiveness of the early-stopping strategy. Notably, RTop-K enables, for the first time, end-to-end acceleration of MaxK-GNN training. Compared to state-of-the-art methods, RTop-K achieves up to 11.49× speedup (with early stopping enabled) and accelerates MaxK-GNN training by 11.97%–33.29%, while preserving test accuracy—zero loss in classification performance.

Technology Category

Application Category

📝 Abstract

Top-k selection algorithms are fundamental in a wide range of applications, including high-performance computing, information retrieval, big data processing, and neural network model training. In this paper, we present RTop-K, a highly efficient parallel row-wise top-k selection algorithm specifically designed for GPUs. RTop-K leverages a binary search-based approach to optimize row-wise top-k selection, providing a scalable and accelerated solution. We conduct a detailed analysis of early stopping in our algorithm, showing that it effectively maintains the testing accuracy of neural network models while substantially improving performance. Our GPU implementation of RTop-K demonstrates superior performance over state-of-the-art row-wise top-k GPU implementations, achieving an average speed-up of up to 11.49$ imes$ with early stopping and 7.29$ imes$ without early stopping. Moreover, RTop-K accelerates the overall training workflow of MaxK-GNNs, delivering speed-ups ranging from 11.97% to 33.29% across different models and datasets.

Problem

Research questions and friction points this paper is trying to address.

Optimizes row-wise top-k selection for GPU acceleration

Maintains neural network accuracy with early stopping

Outperforms existing GPU top-k methods in speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary search-based row-wise top-k selection

Early stopping maintains neural network accuracy

GPU-optimized for ultra-fast parallel processing

🔎 Similar Papers

No similar papers found.