HOT: Hadamard-based Optimized Training

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address excessive memory and computational overhead in deep learning training—particularly during backpropagation, where matrix multiplication constitutes a critical bottleneck—this paper proposes the first Hadamard-transform-based co-optimization framework. Methodologically, it introduces an adaptive selection mechanism that dynamically chooses between Hadamard quantization and low-rank approximation across distinct backward paths, integrated with activation buffer compression and layer-aware quantizer scheduling. The design is optimized for GPU memory access patterns to ensure efficient hardware deployment. Experimental evaluation on mainstream models demonstrates up to 75% GPU memory reduction and 2.6× training speedup over FP32 baselines, with negligible accuracy degradation (<0.3% top-1 error increase). These gains significantly surpass those of existing backpropagation lightweighting approaches, establishing new state-of-the-art efficiency–accuracy trade-offs.

Technology Category

Application Category

📝 Abstract
It has become increasingly important to optimize backpropagation to reduce memory usage and computational overhead. Achieving this goal is highly challenging, as multiple objectives must be considered jointly while maintaining training quality. In this paper, we focus on matrix multiplication, which accounts for the largest portion of training costs, and analyze its backpropagation in detail to identify lightweight techniques that offer the best benefits. Based on this analysis, we introduce a novel method, Hadamard-based Optimized Training (HOT). In this approach, we apply Hadamard-based optimizations, such as Hadamard quantization and Hadamard low-rank approximation, selectively and with awareness of the suitability of each optimization for different backward paths. Additionally, we introduce two enhancements: activation buffer compression and layer-wise quantizer selection. Our extensive analysis shows that HOT achieves up to 75% memory savings and a 2.6 times acceleration on real GPUs, with negligible accuracy loss compared to FP32 precision.
Problem

Research questions and friction points this paper is trying to address.

Optimize backpropagation to reduce memory and computation
Improve matrix multiplication efficiency in training
Apply Hadamard-based techniques for lightweight optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hadamard-based quantization for backpropagation optimization
Layer-wise quantizer selection for enhanced efficiency
Activation buffer compression to reduce memory usage
🔎 Similar Papers
No similar papers found.