Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Balancing expressiveness and computational efficiency remains challenging in sequence modeling. Method: This paper proposes Structured Linear Controlled Differential Equations (SLiCEs), a framework centered on input-dependent structured state-transition matrices—specifically block-diagonal, sparse, and Walsh–Hadamard variants—whose expressive capacity is theoretically proven to match that of dense matrices for the first time. SLiCEs unify and generalize several state-of-the-art models—including S4, Mamba, and DeltaNet—via two novel architectural variants. The approach enables fully parallelizable training while preserving rigorous mathematical interpretability and computational efficiency. Contributions/Results: A single-layer SLiCE solves the A₅ state-tracking task; it achieves significantly superior length generalization on regular languages compared to existing parallel sequential models; and it attains state-of-the-art performance on six multivariate time-series classification benchmarks, reducing training time by 20×.

Technology Category

Application Category

📝 Abstract
Structured Linear Controlled Differential Equations (SLiCEs) provide a unifying framework for sequence models with structured, input-dependent state-transition matrices that retain the maximal expressivity of dense matrices whilst being cheaper to compute. The framework encompasses existing architectures, such as input-dependent block-diagonal linear recurrent neural networks and DeltaNet's diagonal-plus-low-rank structure, as well as two novel variants based on sparsity and the Walsh--Hadamard transform. We prove that, unlike the diagonal state-transition matrices of S4 and Mamba, SLiCEs employing block-diagonal, sparse, or Walsh--Hadamard matrices match the maximal expressivity of dense matrices. Empirically, SLiCEs solve the $A_5$ state-tracking benchmark with a single layer, achieve best-in-class length generalisation on regular language tasks among parallel-in-time models, and match the state-of-the-art performance of log neural controlled differential equations on six multivariate time-series classification datasets while cutting the average time per training step by a factor of twenty.
Problem

Research questions and friction points this paper is trying to address.

Maximizing expressivity of sequence models with structured matrices
Unifying existing architectures and introducing novel variants
Improving computational efficiency while maintaining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured Linear CDEs with input-dependent matrices
Maximal expressivity via block-diagonal or sparse matrices
Efficient parallel-in-time computation for sequences
🔎 Similar Papers
No similar papers found.
B
Benjamin Walker
Mathematical Institute, University of Oxford
Lingyi Yang
Lingyi Yang
University of Oxford
Machine LearningTime seriesControl
N
Nicola Muca Cirone
Department of Mathematics, Imperial College London
Cristopher Salvi
Cristopher Salvi
Imperial College London
probability theorystochastic analysisgenerative models
T
Terry Lyons
Mathematical Institute, University of Oxford