🤖 AI Summary
This work addresses the challenge of learning low-dimensional state representations from partial high-dimensional observations for control, specifically targeting the infinite-horizon linear quadratic Gaussian (LQG) control problem. The authors propose a cost-driven approach to state representation learning, wherein a latent state space is constructed by predicting cumulative costs, enabling the design of a near-optimal controller. The method encompasses both explicit and implicit dynamic modeling, with the latter inspired by MuZero. Additionally, the paper establishes the persistent excitation property for a novel stochastic process, offering independent theoretical value. By integrating quadratic regression with finite-sample analysis, this study provides the first finite-sample performance guarantees for both the representation function and the controller in time-invariant LQG settings.
📝 Abstract
We study the problem of state representation learning for control from partial and potentially high-dimensional observations. We approach this problem via cost-driven state representation learning, in which we learn a dynamical model in a latent state space by predicting cumulative costs. In particular, we establish finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control. We study two approaches to cost-driven representation learning, which differ in whether the transition function of the latent state is learned explicitly or implicitly. The first approach has also been investigated in Part I of this work, for finite-horizon time-varying LQG control. The second approach closely resembles MuZero, a recent breakthrough in empirical reinforcement learning, in that it learns latent dynamics implicitly by predicting cumulative costs. A key technical contribution of this Part II is to prove persistency of excitation for a new stochastic process that arises from the analysis of quadratic regression in our approach, and may be of independent interest.