COREY: A Prototype Study of Entropy-Guided Operator Fusion with Hadamard Reparameterization for Selective State Space Models

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses the memory bandwidth bottleneck of selective state space models during deployment, which arises from fragmented operator decomposition and redundant intermediate tensor generation in state updates. The authors propose a memory-aware operator fusion strategy that leverages activation entropy as a runtime scheduling metric to dynamically optimize fusion boundaries and tiling sizes. Additionally, they introduce a reparameterized linear projection based on normalized Hadamard transforms to mitigate the heavy-tailed distribution of activations, thereby reducing peak coordinate concentration while preserving functional equivalence. Experimental results demonstrate that the proposed approach significantly reduces DRAM traffic, lowers proxy latency, and improves throughput, outperforming both unfused and fixed-depth baselines.

Technology Category

Application Category

📝 Abstract
State Space Models (SSMs), represented by the Mamba family, provide linear-time sequence modeling and are attractive for long-context inference. Yet practical deployments remain memory-bandwidth limited because selective state updates are often decomposed into fragmented kernels with repeated intermediate tensor materialization. We present COREY, a prototype framework that combines memory-aware operator fusion with Hadamard-based feature reparameterization. Activation entropy, estimated with fixed-width histograms, is used as a runtime scheduling statistic to place fusion boundaries and choose tile sizes. To regularize heavy-tailed activations, we absorb normalized Hadamard transforms into linear projections, preserving functional equivalence while reducing peak-coordinate concentration. In a controlled prototype study over heavy-tailed SSM activations, COREY consistently reduces proxy latency, improves throughput, and lowers DRAM traffic relative to unfused and fixed-depth baselines. Low-bit results are reported only through a hand-crafted stability proxy and are intended as diagnostic evidence rather than checkpoint-level quality claims. Code repository: https://github.com/mabo1215/COREY_Transformer.git.
Problem

Research questions and friction points this paper is trying to address.

State Space Models
memory bandwidth
operator fusion
intermediate tensor materialization
selective state updates
Innovation

Methods, ideas, or system contributions that make the work stand out.

operator fusion
Hadamard reparameterization
activation entropy
state space models
memory bandwidth optimization
🔎 Similar Papers
No similar papers found.