🤖 AI Summary
This work establishes non-asymptotic normal approximation bounds for two-timescale linear stochastic approximation (TTSA) under martingale-difference or Markovian noise. We analyze both the final iterate and the Polyak–Ruppert averaged estimator, deriving unified high-order moment error bounds under convex distance metrics—first such results for TTSA. A key finding is that increasing scale separation between the fast and slow updates improves the normal approximation accuracy of the final iterate but degrades that of the averaged estimator, revealing a nontrivial trade-off in convergence behavior. Methodologically, we combine higher-order moment analysis with precise characterization of noise structure, circumventing classical asymptotic assumptions (e.g., diminishing step sizes, infinite-time limits). Our results provide the first rigorous non-asymptotic theoretical foundation for statistical inference—including confidence interval construction—for TTSA-based algorithms such as Actor-Critic and meta-learning.
📝 Abstract
In this paper, we establish non-asymptotic bounds for accuracy of normal approximation for linear two-timescale stochastic approximation (TTSA) algorithms driven by martingale difference or Markov noise. Focusing on both the last iterate and Polyak-Ruppert averaging regimes, we derive bounds for normal approximation in terms of the convex distance between probability distributions. Our analysis reveals a non-trivial interaction between the fast and slow timescales: the normal approximation rate for the last iterate improves as the timescale separation increases, while it decreases in the Polyak-Ruppert averaged setting. We also provide the high-order moment bounds for the error of linear TTSA algorithm, which may be of independent interest.