🤖 AI Summary
This work addresses the challenge of deploying foundation models on heterogeneous devices, which typically requires multiple versions via retraining or distillation. The authors propose the Elastic Spectral State Space Model (ES-SSM), which, for the first time, enables a single full-capacity training run followed by runtime inference that can be continuously and fine-grainedly truncated to meet arbitrary computational budgets—without any retraining or distillation. ES-SSM integrates a Hankel spectral filtering-based state space architecture, input-adaptive gating, stochastic spectral budget training, and a shared mask normalization mechanism. Evaluated across long-sequence tasks in text, logical reasoning, retrieval, vision, and audio, a single ES-SSM model matches the performance of Transformer and SSM baselines of comparable size under various truncations, while exhibiting smooth and stable budget–performance trade-offs, significantly enhancing deployment flexibility and efficiency.
📝 Abstract
Foundation models are typically trained at a fixed computational capacity, while real-world applications require deployment across platforms with different resource constraints. Current approaches usually rely on training families of model variants or model distillation, which requires additional training and supports only a pre-selected set of sizes rather than fine-grained adaptation at runtime. In this paper, we propose Elastic Spectral State Space Models (ES-SSM), which require only one-time training at full capacity, but can be directly truncated into arbitrary scales for budgeted, runtime inference without retraining. Our ES-SSM builds on Hankel spectral filtering over a state space model (SSM), coupled with a lightweight input-adaptive gate trained under randomized spectral budgets. Using a shared masked normalization rule over the ordered spectral channels, we encourage predictive capability to concentrate in low-index components, while higher-index components act primarily as refinement. We test our algorithm across long-sequence benchmarks spanning text, logic, retrieval, vision, and audio. We demonstrate that a single ES-SSM model trained once can be truncated to provide competitive performance compared with modern Transformer and SSM baselines at similar parameter scales. Furthermore, by testing under various runtime budgets, we observe smooth and stable budget-performance curves over a wide range of truncation levels.