🤖 AI Summary
State space models (SSMs) efficiently model long sequences but suffer from time-invariance and real-valued recurrent dynamics, limiting their ability to recognize certain regular languages—e.g., modular counting and parity checking. Method: We propose Adaptive Unitary State Space Models (AUSSMs), introducing the first input-dependent skew-symmetric unitary recurrence mechanism. This design enables exact simulation of solvable-group automata and precise modular arithmetic. AUSSMs integrate separable convolutions with CUDA-optimized kernels, preserving linear-time complexity while enabling efficient parallel training. Contribution/Results: Theoretically, AUSSMs overcome a fundamental expressivity barrier of SSMs in formal language recognition. Empirically, they significantly outperform existing SSMs on algorithmic tasks—including parity detection and modular arithmetic—and achieve state-of-the-art performance on real-world long-sequence classification benchmarks. AUSSMs thus unify symbolic reasoning and continuous sequential modeling within a single, scalable architecture.
📝 Abstract
Recent work has revealed that state space models (SSMs), while efficient for long-sequence processing, are fundamentally limited in their ability to represent formal languages particularly due to time-invariant and real-valued recurrence structures. In this work, we draw inspiration from adaptive and structured dynamics observed in biological neural systems and introduce the Adaptive Unitary State Space Model (AUSSM)- a novel class of SSMs that leverages skew-symmetric, input-dependent recurrence to achieve unitary evolution and high expressive power. Using algebraic automata theory, we prove that AUSSM can perform modulo counting and simulate solvable group automata at finite precision, enabling SSMs to model a broad class of regular languages that are out of reach for other SSM architectures. To overcome the practical inefficiencies of adaptive recurrence, we develop a separable convolution formulation and a CUDA implementation that enables scalable parallel training. Empirically, we show that AUSSM when interleaved with Mamba outperform prior SSMs on formal algorithmic tasks such as parity and modular arithmetic, and achieve competent performance on real-world long time-series classification benchmarks. Our results demonstrate that adaptive unitary recurrence provides a powerful and efficient inductive bias for both symbolic and continuous sequence modeling.