🤖 AI Summary
To address the inefficiency of memory management in nonlinear RNNs and the limitations imposed by linear assumptions in conventional state-space models, this paper proposes Comba—a novel closed-loop control-driven architecture. Methodologically, Comba integrates scalar-plus-low-rank state transitions with dual feedback (state and output) to enhance nonlinear dynamical modeling; it is the first to systematically incorporate closed-loop control theory into nonlinear RNN design; and it introduces a Triton-optimized chunk-wise parallel computation kernel for hardware efficiency and scalable training. Trained under the Delta learning rule, Comba achieves state-of-the-art performance: its 340M- and 1.3B-parameter variants significantly outperform baselines—including Mamba and RWKV-7—across language and vision tasks, delivering higher accuracy while reducing computational overhead.
📝 Abstract
Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, resulting in a nonlinear recursive structure. In this paper, we first introduce the concept of Nonlinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Nonlinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates its superior performance and computation efficiency in both language and vision modeling.