🤖 AI Summary
This work investigates whether sequence models possess temporal continuity as an inductive bias and how this property affects performance on tasks with continuous-time structure. We formalize model continuity through convergence behavior under temporal refinement, introduce a quantitative measure of task continuity, and systematically analyze how alignment between model and task continuity influences performance. Focusing on state-space models—specifically S4 and S6/Mamba—we employ continuous dynamical analysis, discretization convergence verification, and temporal subsampling experiments. Our findings reveal that S4 exhibits stable continuity, whereas S6 is more sensitive to input magnitude. Crucially, aligning model continuity with task continuity substantially enhances performance and enables efficient temporal subsampling strategies.
📝 Abstract
Inductive biases influence the behavior and performance of sequential models. In this work, we study an underexplored inductive bias in sequential modeling: continuity in time. We ask a simple question: do models motivated by continuous-time formulations, such as state-space models, actually behave continuously in time, and does this translate into better performance on tasks with continuous temporal structure? To answer this, we formalize model continuity as convergence under temporal refinement, where a model is continuous if its predictions approach an underlying continuous trajectory as the temporal discretization is refined. We show that S4 exhibits stable continuous behavior, whereas S6 (the core of Mamba) can be more sensitive to input amplitude and selective dynamics, despite being derived from a continuous dynamical system. To study whether this distinction matters for learning, we also need a corresponding notion of task continuity. We therefore introduce a metric to quantify the continuity of datasets directly from their temporal structure. Across benchmarks, we find a clear empirical alignment between task continuity, model continuity, and model performance. Beyond an inductive bias, continuity also has practical consequences: we show that it enables a simple temporal subsampling strategy that improves both efficiency and performance.