🤖 AI Summary
This work addresses the memory bandwidth bottleneck of selective state space models during deployment, which arises from fragmented operator decomposition and redundant intermediate tensor generation in state updates. The authors propose a memory-aware operator fusion strategy that leverages activation entropy as a runtime scheduling metric to dynamically optimize fusion boundaries and tiling sizes. Additionally, they introduce a reparameterized linear projection based on normalized Hadamard transforms to mitigate the heavy-tailed distribution of activations, thereby reducing peak coordinate concentration while preserving functional equivalence. Experimental results demonstrate that the proposed approach significantly reduces DRAM traffic, lowers proxy latency, and improves throughput, outperforming both unfused and fixed-depth baselines.
📝 Abstract
State Space Models (SSMs), represented by the Mamba family, provide linear-time sequence modeling and are attractive for long-context inference. Yet practical deployments remain memory-bandwidth limited because selective state updates are often decomposed into fragmented kernels with repeated intermediate tensor materialization. We present COREY, a prototype framework that combines memory-aware operator fusion with Hadamard-based feature reparameterization. Activation entropy, estimated with fixed-width histograms, is used as a runtime scheduling statistic to place fusion boundaries and choose tile sizes. To regularize heavy-tailed activations, we absorb normalized Hadamard transforms into linear projections, preserving functional equivalence while reducing peak-coordinate concentration. In a controlled prototype study over heavy-tailed SSM activations, COREY consistently reduces proxy latency, improves throughput, and lowers DRAM traffic relative to unfused and fixed-depth baselines. Low-bit results are reported only through a hand-crafted stability proxy and are intended as diagnostic evidence rather than checkpoint-level quality claims. Code repository: https://github.com/mabo1215/COREY_Transformer.git.