Aligning Inductive Bias for Data-Efficient Generalization in State Space Models

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

In low-data regimes, state space models (SSMs) suffer from poor sample efficiency and weak generalization due to rigid, task-agnostic inductive biases. To address this, this work formally characterizes the inductive bias of linear time-invariant SSMs as an “SSM-induced kernel” and proposes a task-dependent initialization method based on spectral matching: prior to training, model parameters are dynamically aligned with the spectral structure of the target task via frequency-domain analysis and kernel theory. This approach incurs no additional training overhead while significantly improving few-shot generalization. Empirical evaluation across multiple real-world benchmarks demonstrates substantial performance gains over strong baselines—particularly in ultra-low-data regimes—establishing a new paradigm for efficient and scalable SSM design.

Technology Category

Application Category

📝 Abstract

The remarkable success of large-scale models is fundamentally tied to scaling laws, yet the finite nature of high-quality data presents a looming challenge. One of the next frontiers in modeling is data efficiency: the ability to learn more from less. A model's inductive bias is a critical lever for this, but foundational sequence models like State Space Models (SSMs) rely on a fixed bias. This fixed prior is sample-inefficient when a task's underlying structure does not match. In this work, we introduce a principled framework to solve this problem. We first formalize the inductive bias of linear time-invariant SSMs through an SSM-induced kernel, mathematically and empirically proving its spectrum is directly governed by the model's frequency response. Further, we propose a method of Task-Dependent Initialization (TDI): power spectrum matching, a fast and efficient method that aligns the model's inductive bias with the task's spectral characteristics before large-scale training. Our experiments on a diverse set of real-world benchmarks show that TDI significantly improves generalization and sample efficiency, particularly in low-data regimes. This work provides a theoretical and practical tool to create more data-efficient models, a crucial step towards sustainable scaling.

Problem

Research questions and friction points this paper is trying to address.

Aligning inductive bias with task structure in State Space Models

Improving data efficiency and generalization in low-data regimes

Overcoming fixed inductive bias limitations through spectral matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns inductive bias with task characteristics

Uses power spectrum matching for initialization

Improves generalization in low-data regimes

🔎 Similar Papers

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities