Comba: Improving Nonlinear RNNs with Closed-loop Control

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

To address the inefficiency of memory management in nonlinear RNNs and the limitations imposed by linear assumptions in conventional state-space models, this paper proposes Comba—a novel closed-loop control-driven architecture. Methodologically, Comba integrates scalar-plus-low-rank state transitions with dual feedback (state and output) to enhance nonlinear dynamical modeling; it is the first to systematically incorporate closed-loop control theory into nonlinear RNN design; and it introduces a Triton-optimized chunk-wise parallel computation kernel for hardware efficiency and scalable training. Trained under the Delta learning rule, Comba achieves state-of-the-art performance: its 340M- and 1.3B-parameter variants significantly outperform baselines—including Mamba and RWKV-7—across language and vision tasks, delivering higher accuracy while reducing computational overhead.

Technology Category

Application Category

📝 Abstract

Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, resulting in a nonlinear recursive structure. In this paper, we first introduce the concept of Nonlinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Nonlinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates its superior performance and computation efficiency in both language and vision modeling.

Problem

Research questions and friction points this paper is trying to address.

Improving nonlinear RNNs via closed-loop control theory

Addressing limitations of Delta learning-based sequence models

Enhancing efficiency in large-scale language and vision tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-loop control theory for RNNs

Scalar-plus-low-rank state transition

Hardware-efficient chunk-wise parallel kernel

🔎 Similar Papers

Learning to (Learn at Test Time): RNNs with Expressive Hidden States