🤖 AI Summary
To address the performance degradation and training instability of RNNs under weight quantization for edge deployment, this paper introduces the first binary and sparse ternary quantization schemes for vanilla RNN recurrent weights. We propose an orthogonal parameterization method based on Hadamard matrices to explicitly construct orthogonal recurrent structures, integrated with Straight-Through Estimator (STE) for end-to-end training. The resulting Binary/Sparse Ternary Orthogonal RNNs (B/T-ORNNs) match full-precision SOTA models on long-sequence copy tasks (≥1000 steps), pMNIST, sMNIST, and IMDB, while drastically reducing memory footprint and computational cost—enabling real-time inference on edge devices. Key contributions include: (1) the first stably trained binary vanilla RNN; (2) the first sparse ternary orthogonal RNN architecture; and (3) a hardware-friendly co-design paradigm unifying orthogonality and quantization.
📝 Abstract
Binary and sparse ternary weights in neural networks enable faster computations and lighter representations, facilitating their use on edge devices with limited computational power. Meanwhile, vanilla RNNs are highly sensitive to changes in their recurrent weights, making the binarization and ternarization of these weights inherently challenging. To date, no method has successfully achieved binarization or ternarization of vanilla RNN weights. We present a new approach leveraging the properties of Hadamard matrices to parameterize a subset of binary and sparse ternary orthogonal matrices. This method enables the training of orthogonal RNNs (ORNNs) with binary and sparse ternary recurrent weights, effectively creating a specific class of binary and sparse ternary vanilla RNNs. The resulting ORNNs, called HadamRNN and lock-HadamRNN, are evaluated on benchmarks such as the copy task, permuted and sequential MNIST tasks, and IMDB dataset. Despite binarization or sparse ternarization, these RNNs maintain performance levels comparable to state-of-the-art full-precision models, highlighting the effectiveness of our approach. Notably, our approach is the first solution with binary recurrent weights capable of tackling the copy task over 1000 timesteps.