TC-MIS: Maximal Independent Set on Tensor-cores

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently harnessing GPU parallelism for the Maximum Independent Set (MIS) problem, which is hindered by irregular graph structures, sparse memory access patterns, and load imbalance. The authors propose the first approach to accelerate graph algorithms using Tensor Cores, reformulating key phases of MIS computation as sparse matrix-vector multiplication (SpMV). By partitioning adjacency matrices and leveraging warp-level matrix multiply-accumulate (WMMA) operations, the method transforms irregular graph traversals into regular, massively parallel computations. Evaluated across NVIDIA’s Ampere to Blackwell architectures—including the H200—the technique achieves up to 44.38× speedup, with average improvements ranging from 2.84× to 18.80×, while preserving solution quality comparable to classical heuristic algorithms.
📝 Abstract
Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to irregular memory access and workload imbalance, specialized GPU algorithms have achieved good performance, processing million-vertex graphs in milliseconds. Modern GPUs are equipped with Tensor Cores (TCs), specialized units for matrix operations with 8-16x higher throughput than CUDA Cores (CCs), which are extensively used for ML, DL, and inference tasks but remain largely unexplored for graph algorithms. In this paper, we present TC-MIS, a TC-accelerated algorithm that reformulates key phases of MIS computation as sparse matrix-vector multiplication (SpMV). TC-MIS tiles the graph adjacency matrix and employs Warp Matrix Multiply-Accumulate (WMMA) operations to transform irregular graph traversal into regular, massively parallel computation. Our evaluation across TC-enabled microarchitectures (Ampere, Ada Lovelace, Hopper, Blackwell) demonstrates that TC-MIS achieves an average speedup of 2.84x on RTX A5000, 4.84x on L40S, 18.80x on H200 GPUs, and 5.20x on RTX 5080 with a maximum speedup of 44.38x on H200 GPU over state-of-the-art methods, while maintaining solution quality comparable to that obtained by established heuristics that produce near-maximum independent sets.
Problem

Research questions and friction points this paper is trying to address.

Maximal Independent Set
Tensor Cores
Graph Algorithms
GPU Acceleration
Irregular Computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tensor Cores
Maximal Independent Set
Sparse Matrix-Vector Multiplication
WMMA
GPU Acceleration