🤖 AI Summary
This work addresses the challenge of reconciling hardware efficiency with performance preservation in sparse deep reinforcement learning. While unstructured sparsity lacks hardware support and structured sparsity often degrades performance, this study introduces row-wise N:M semi-structured sparsity into the off-policy TD3 algorithm for the first time, proposing an end-to-end trainable, hardware-aware sparse RL framework. By enforcing N:M sparsity constraints throughout training—without requiring post-processing—the method enables efficient learning even at high sparsity levels. Experiments on continuous control tasks such as Ant demonstrate a 14% performance gain under 2:4 sparsity, while maintaining competitive performance even at 87.5% sparsity (1:8). The approach is natively compatible with emerging hardware that supports N:M sparse computation, offering simultaneous benefits in model compression, performance improvement, and training acceleration.
📝 Abstract
Sparsity is a well-studied technique for compressing deep neural networks (DNNs) without compromising performance. In deep reinforcement learning (DRL), neural networks with up to 5% of their original weights can still be trained with minimal performance loss compared to their dense counterparts. However, most existing methods rely on unstructured fine-grained sparsity, which limits hardware acceleration opportunities due to irregular computation patterns. Structured coarse-grained sparsity enables hardware acceleration, yet typically degrades performance and increases pruning complexity. In this work, we present, to the best of our knowledge, the first study on N:M structured sparsity in RL, which balances compression, performance, and hardware efficiency. Our framework enforces row-wise N:M sparsity throughout training for all networks in off-policy RL (TD3), maintaining compatibility with accelerators that support N:M sparse matrix operations. Experiments on continuous-control benchmarks show that RNM-TD3, our N:M sparse agent, outperforms its dense counterpart at 50%-75% sparsity (e.g., 2:4 and 1:4), achieving up to a 14% increase in performance at 2:4 sparsity on the Ant environment. RNM-TD3 remains competitive even at 87.5% sparsity (1:8), while enabling potential training speedups.