🤖 AI Summary
In low-data regimes, state space models (SSMs) suffer from poor sample efficiency and weak generalization due to rigid, task-agnostic inductive biases. To address this, this work formally characterizes the inductive bias of linear time-invariant SSMs as an “SSM-induced kernel” and proposes a task-dependent initialization method based on spectral matching: prior to training, model parameters are dynamically aligned with the spectral structure of the target task via frequency-domain analysis and kernel theory. This approach incurs no additional training overhead while significantly improving few-shot generalization. Empirical evaluation across multiple real-world benchmarks demonstrates substantial performance gains over strong baselines—particularly in ultra-low-data regimes—establishing a new paradigm for efficient and scalable SSM design.
📝 Abstract
The remarkable success of large-scale models is fundamentally tied to scaling laws, yet the finite nature of high-quality data presents a looming challenge. One of the next frontiers in modeling is data efficiency: the ability to learn more from less. A model's inductive bias is a critical lever for this, but foundational sequence models like State Space Models (SSMs) rely on a fixed bias. This fixed prior is sample-inefficient when a task's underlying structure does not match. In this work, we introduce a principled framework to solve this problem. We first formalize the inductive bias of linear time-invariant SSMs through an SSM-induced kernel, mathematically and empirically proving its spectrum is directly governed by the model's frequency response. Further, we propose a method of Task-Dependent Initialization (TDI): power spectrum matching, a fast and efficient method that aligns the model's inductive bias with the task's spectral characteristics before large-scale training. Our experiments on a diverse set of real-world benchmarks show that TDI significantly improves generalization and sample efficiency, particularly in low-data regimes. This work provides a theoretical and practical tool to create more data-efficient models, a crucial step towards sustainable scaling.