🤖 AI Summary
Existing causal Transformer language models are constrained by strictly sequential decoding and quadratic attention complexity, while linear-time causal architectures struggle to effectively integrate with discrete diffusion mechanisms. This work proposes the Triadic Block Layout, which for the first time enables parallel bidirectional discrete diffusion within the RWKV architecture while preserving O(L) inference complexity, thereby resolving the fundamental conflict between causal generation and diffusion directionality. Built upon this framework, the B³D-RWKV-7.2B model achieves accuracy comparable to state-of-the-art models across eight benchmark tasks and demonstrates an average decoding throughput improvement of 1.6×.
📝 Abstract
Causal Transformer language models suffer from strictly sequential decoding and a quadratic per-step attention cost. While linear-time causal models and discrete diffusion models each address these weaknesses, their integration remains inherently inconsistent: diffusion requires bidirectional attention, while causal models are unidirectional. To unify these architectures, we propose $B^3D-RWKV$, a diffusion RWKV variant that integrates the model's $O(L)$ inference efficiency with parallel, bidirectional discrete-diffusion through a \emph{triplet-block layout} method. $B^3D-RWKV-7.2B$ reaches comparable accuracy on an 8-task suite versus existing models while significantly outperforming baselines in decoding throughput with an average of $\mathbf{1.6\times}$ speedup.