Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Balancing expressiveness and computational efficiency remains challenging in sequence modeling. Method: This paper proposes Structured Linear Controlled Differential Equations (SLiCEs), a framework centered on input-dependent structured state-transition matrices—specifically block-diagonal, sparse, and Walsh–Hadamard variants—whose expressive capacity is theoretically proven to match that of dense matrices for the first time. SLiCEs unify and generalize several state-of-the-art models—including S4, Mamba, and DeltaNet—via two novel architectural variants. The approach enables fully parallelizable training while preserving rigorous mathematical interpretability and computational efficiency. Contributions/Results: A single-layer SLiCE solves the A₅ state-tracking task; it achieves significantly superior length generalization on regular languages compared to existing parallel sequential models; and it attains state-of-the-art performance on six multivariate time-series classification benchmarks, reducing training time by 20×.

Technology Category

Application Category

📝 Abstract

Structured Linear Controlled Differential Equations (SLiCEs) provide a unifying framework for sequence models with structured, input-dependent state-transition matrices that retain the maximal expressivity of dense matrices whilst being cheaper to compute. The framework encompasses existing architectures, such as input-dependent block-diagonal linear recurrent neural networks and DeltaNet's diagonal-plus-low-rank structure, as well as two novel variants based on sparsity and the Walsh--Hadamard transform. We prove that, unlike the diagonal state-transition matrices of S4 and Mamba, SLiCEs employing block-diagonal, sparse, or Walsh--Hadamard matrices match the maximal expressivity of dense matrices. Empirically, SLiCEs solve the $A_5$ state-tracking benchmark with a single layer, achieve best-in-class length generalisation on regular language tasks among parallel-in-time models, and match the state-of-the-art performance of log neural controlled differential equations on six multivariate time-series classification datasets while cutting the average time per training step by a factor of twenty.

Problem

Research questions and friction points this paper is trying to address.

Maximizing expressivity of sequence models with structured matrices

Unifying existing architectures and introducing novel variants

Improving computational efficiency while maintaining performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured Linear CDEs with input-dependent matrices

Maximal expressivity via block-diagonal or sparse matrices

Efficient parallel-in-time computation for sequences

🔎 Similar Papers

Latent Space Energy-based Neural ODEs