Nearly Instance-Optimal Parameter Recovery from Many Trajectories via Hellinger Localization

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Parameter estimation for multi-trajectory time-series data—i.e., multiple independent realizations of stochastic processes—remains challenging, as existing instance-optimal theory is confined to covariate-dependent least-squares regression and fails to extend to generalized models or non-i.i.d. sequences. Method: This paper introduces the Hellinger distance into multi-trajectory parameter learning, proposing a unified framework based on Hellinger localization: it achieves i.i.d. reduction at the path-measure level and Fisher-information-weighted localization in parameter space, circumventing strong assumptions such as Markov mixing. Contribution/Results: The method attains instance-optimal convergence rates—matching the asymptotically normal lower bound—in four canonical settings: Markov mixing processes, non-Gaussian regression, generalized linear models, and linear attention mechanisms. Its effective sample size achieves the theoretical upper bound, substantially outperforming conventional approaches.

Technology Category

Application Category

📝 Abstract
Learning from temporally-correlated data is a core facet of modern machine learning. Yet our understanding of sequential learning remains incomplete, particularly in the multi-trajectory setting where data consists of many independent realizations of a time-indexed stochastic process. This important regime both reflects modern training pipelines such as for large foundation models, and offers the potential for learning without the typical mixing assumptions made in the single-trajectory case. However, instance-optimal bounds are known only for least-squares regression with dependent covariates; for more general models or loss functions, the only broadly applicable guarantees result from a reduction to either i.i.d. learning, with effective sample size scaling only in the number of trajectories, or an existing single-trajectory result when each individual trajectory mixes, with effective sample size scaling as the full data budget deflated by the mixing-time. In this work, we significantly broaden the scope of instance-optimal rates in multi-trajectory settings via the Hellinger localization framework, a general approach for maximum likelihood estimation. Our method proceeds by first controlling the squared Hellinger distance at the path-measure level via a reduction to i.i.d. learning, followed by localization as a quadratic form in parameter space weighted by the trajectory Fisher information. This yields instance-optimal bounds that scale with the full data budget under a broad set of conditions. We instantiate our framework across four diverse case studies: a simple mixture of Markov chains, dependent linear regression under non-Gaussian noise, generalized linear models with non-monotonic activations, and linear-attention sequence models. In all cases, our bounds nearly match the instance-optimal rates from asymptotic normality, substantially improving over standard reductions.
Problem

Research questions and friction points this paper is trying to address.

Develop instance-optimal parameter recovery from multiple correlated trajectories
Establish learning guarantees without typical mixing assumptions for sequential data
Broaden instance-optimal rates across diverse models using Hellinger localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Hellinger localization framework for estimation
Controls squared Hellinger distance via i.i.d. reduction
Localizes as quadratic form with Fisher information
🔎 Similar Papers
No similar papers found.
E
Eliot Shekhtman
Department of Electrical and Systems Engineering, University of Pennsylvania
Y
Yichen Zhou
Department of Electrical and Computer Engineering, University of Southern California
Ingvar Ziemann
Ingvar Ziemann
Unknown affiliation
Machine LearningControls
Nikolai Matni
Nikolai Matni
Associate Professor of Electrical and Systems Engineering, University of Pennsylvania
Control TheoryMachine LearningOptimization
Stephen Tu
Stephen Tu
University of Southern California