🤖 AI Summary
This work addresses long-context linear system identification: the system state (x_t) evolves as a linear function of its past (p) states—forming a high-dimensional long-range autoregressive (AR-(p)) model—contrasting with standard first-order AR dynamics. We propose a rank-regularized least-squares estimator and integrate stability analysis with high-dimensional statistical learning theory to achieve, for the first time, optimal sample complexity of (O(d^2/p)) (up to logarithmic factors), matching the rate for i.i.d. parameter estimation. Our theory reveals a “learning-without-mixing” phenomenon, justifying shared low-rank structure; it further ensures robust dimension scaling under low-rank assumptions and even when the context length (p) is misspecified—thereby overcoming fundamental limitations of existing methods that rely on short memory or strong mixing conditions.
📝 Abstract
This paper addresses the problem of long-context linear system identification, where the state $x_t$ of a dynamical system at time $t$ depends linearly on previous states $x_s$ over a fixed context window of length $p$. We establish a sample complexity bound that matches the i.i.d. parametric rate up to logarithmic factors for a broad class of systems, extending previous works that considered only first-order dependencies. Our findings reveal a learning-without-mixing phenomenon, indicating that learning long-context linear autoregressive models is not hindered by slow mixing properties potentially associated with extended context windows. Additionally, we extend these results to (i) shared low-rank representations, where rank-regularized estimators improve rates with respect to dimensionality, and (ii) misspecified context lengths in strictly stable systems, where shorter contexts offer statistical advantages.